Developer's Guide
Welcome to the MolTopolParser developer's guide. This guide provides all the necessary information for setting up your development environment, understanding the codebase, and contributing effectively to the project.
Environment Setup
Clone the Repository
Start by cloning the MolTopolParser repository to your local machine:
Create a Virtual Environment
It's recommended to create a virtual environment for development to manage dependencies cleanly:
Install Dependencies
Install all required dependencies using pip:
Run Tests
Ensure that all tests pass to verify your setup:
Developing a New Module
If you are adding a new module to support a different simulation software, here's how to get started:
Add a module file
- Create a new file under the
moltopolparser
directory for your module, e.g.,xxx.py
. - Register the new module in the
__init__.py
file:
Define Data Classes
Organize your module using the Three-Order Component (TOC) model:
- Level 1 Base Data Classes - Define data classes for the smallest data units.
- Level 2 Aggregation Classes - Group base classes into meaningful sections.
- Level 3 Summary Classes - Organize and manage entire file contents or complex data structures.
Implement Parsing Functions
Add class methods for parsing data, validating, and possibly manipulating or transforming data.
Testing Your Module
Create a testing file in the tests
directory:
- Structure: Place your test file, e.g.,
test_xxx.py
. - Data Files: Store necessary data files under
tests/data/xxx/
for use in tests.
Example Code Block
Here’s a simplified structure of what your module file might look like:
# Header and Imports
"""
Module to parse and manage data from Software XXX
"""
import necessary_libraries # (1)
# Base dataclasses
class BaseClass:
pass
# Aggregation dataclasses
class AggregateClass:
pass
# Summary dataclasses
class SummaryClass:
pass
# Additional helper functions if needed
- If new packages need to be installed, please also update the
setup.py
andrequirements.txt
files
Understanding Data Access and Manipulation
MolTopolParser simplifies the process of working with molecular simulation files by providing structured data access through its class hierarchy. The process typically begins with the initialization of a top-level data class, which acts as a gateway to all underlying data associated with a simulation.
Top-Level Dataclass Initialization
When you initialize the top-level dataclass, it constructs an instance that encapsulates all the necessary information of the system and its related contents from the target files. This encapsulation allows for a structured and organized approach to accessing and manipulating the data:
# Example of initializing a top-level dataclass
system_topology = SystemTopology(parser="path/to/topology_file.top")
Accessing Data
Once the top-level class is initialized, specific data can be accessed by invoking corresponding
methods defined within the class. These methods typically follow the naming convention pull_*
to indicate that they
retrieve specific types of data or perform certain operations to organize and validate data:
# Example of accessing data using a method
system_topology.pull_forcefield()
system_topology.pull_molecules()
forcefield_data = system_topology.forcefield
molecules = system_topology.molecules
Illustration
The following figure demonstrates the Initialization and Access Procoesses based on a toy module file.
Here’s the toy module file:
Five Sections in a toye module file
class Base1(BaseModel):
"""
Description
"""
### attributes, e.g.
field_1: int = Field(..., description="x")
### parse templates
model_config = ConfigDict(
json_schema_extra={
...
}
)
#### classmethod for parsing
@classmethod
def parser(cls, content: List[str]) -> Union["Base1", List["Base1"]]:
...
class Base2 ...
class Aggregation1(BaseModel):
"""
Description
"""
### attributes of BaseData e.g.
field_1: List[Base1] = Field(..., description="x")
field_2: List[Base2] = Field(..., description="x")
### parse templates
model_config = ConfigDict(
json_schema_extra={
...
}
)
### classmethod parsing
@classmethod
def parser(cls, content_lines: Optional[List[str]] = None, content_files: Optional[List[str]] = None):
"""
Parse the aggregation data from the content passed when it is called
"""
data1_list = Base1.parser(content_lines) # <-- call the exact parsing classmethod at Base level
data2_list = Base2.parser(content_lines)
data = {
field_1:data1_list
field_2:data2_list
}
return Aggregation1(**data)
class Summary(BaseModel):
"""
Description
"""
### manditory attributes to initite
field_1: str = Field(..., description="x")
### attritues composed of AggregationData to be parsed e.g.
field_2: Optional[List[Aggregation1]] = Field(None, description="x")
### attritue, collection of content to be passed when delegate the parsing tasks
### could be lines of data, or route of flles
pass_content: Optional[List[str]] = Field( None,description="x",)
### parse templates
model_config = ConfigDict(
json_schema_extra={
...
}
)
### classmethod for shallow parsing,
### obtain the manditory attributes
### and return a cls instance
@classmethod
def parser(cls, filename: str):
...
### instance method to delegate parsing tasks
def pull_field2(self)
### call aggregation level classmethod
self.field_2 = Aggregation1.parser(self.pass_content)
Practice with gmx.py
Module gmx.py
contains all the detailed implemenations.
Via reading the code, all the concepts should be clarified.
It is also recommended to copy the gmx.py
to start a new module.
Submitting Contributions
Pull Requests
- Fork the repository - Make a copy of the project on your GitHub account.
- Make your changes - Work on your fork and make the changes you propose.
- Submit a pull request - Open a pull request from your fork back to the main repository.
Code Style and Review
Follow PEP 8 guidelines for Python code. Ensure your code is clean and well-documented. All contributions will undergo a review process by core maintainers.
Attention
Early Version Data Structure Improvements
In the initial versions of MolTopolParser, all data classes inherit directly from pydantic.BaseModel
.
This implementation did not utilize templated data classes for each hierarchical level,
leading to some redundancy that could be streamlined to enhance the package's efficiency and maintainability.
If you encounter issues or have suggestions for improvements,
please do not hesitate to submit an issue request
or contribute directly by implementing enhancements.
API Documentation
As of now, comprehensive API documentation has not been established for MolTopolParser.
For users requiring documentation of specific data formats,
it is advised to create a markdown file, such as xxx_yyy.md
, detailing the data structure and its usage.
Place this file in the docs/reference
directory.
To update the documentation site with this new information,
add an entry to the mkdocs.yml
file under the Reference
section and use the mkdocs serve
command to generate and view the documentation locally.
Example modification in mkdocs.yml
file:
nav:
- Getting Started: index.md
- Concepts: concepts.md
- Developer's Guide: developer.md
- Reference:
- gmx.MolTop: reference/gmx.MolTop.md
- Name_eg_xxx.yyy: reference/xxx_yyy.md #<------ here
Need Help?
If you encounter any problems or have questions while contributing, please open an issue on the GitHub repository.