graph LR
System_Configuration_Data_Management["System Configuration & Data Management"]
Core_Diffusion_Model["Core Diffusion Model"]
Training_Generation_Engine["Training & Generation Engine"]
Model_Persistence["Model Persistence"]
Evaluation_Framework["Evaluation Framework"]
System_Configuration_Data_Management -- "provides inputs to" --> Training_Generation_Engine
System_Configuration_Data_Management -- "provides configuration to" --> Core_Diffusion_Model
System_Configuration_Data_Management -- "provides configuration to" --> Model_Persistence
System_Configuration_Data_Management -- "provides configuration to" --> Evaluation_Framework
Core_Diffusion_Model -- "processes data for" --> Training_Generation_Engine
Core_Diffusion_Model -- "receives configuration from" --> System_Configuration_Data_Management
Training_Generation_Engine -- "consumes inputs from" --> System_Configuration_Data_Management
Training_Generation_Engine -- "interacts with" --> Core_Diffusion_Model
Training_Generation_Engine -- "manages" --> Model_Persistence
Training_Generation_Engine -- "provides outputs to" --> Evaluation_Framework
Model_Persistence -- "provides models to" --> Evaluation_Framework
Model_Persistence -- "uses" --> System_Configuration_Data_Management
click System_Configuration_Data_Management href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/genie/System_Configuration_Data_Management.md" "Details"
click Core_Diffusion_Model href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/genie/Core_Diffusion_Model.md" "Details"
click Training_Generation_Engine href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/genie/Training_Generation_Engine.md" "Details"
click Model_Persistence href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/genie/Model_Persistence.md" "Details"
click Evaluation_Framework href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/genie/Evaluation_Framework.md" "Details"
The genie project, a Deep Learning Research Framework for Computational Structural Biology, exhibits a modular and data-centric architecture. The analysis of its Control Flow Graph (CFG) and source code reveals five core components that manage the entire lifecycle from configuration and data handling to model training, generation, and evaluation.
System Configuration & Data Management [Expand]
This foundational component centralizes application settings, hyperparameters, and manages the loading, preprocessing, and organization of structural biology datasets (e.g., SCOPe). It provides the necessary inputs and configurations for all other parts of the system.
Related Classes/Methods:
genie/config.py(1:1)genie/data/data_module.py(1:1)genie/data/dataset.py(1:1)genie/utils/data_io.py(1:1)
Core Diffusion Model [Expand]
This is the central deep learning component. It encapsulates the mathematical and algorithmic foundation of the equivariant diffusion model, including the forward (noising) and reverse (denoising) processes, noise schedules, and the neural network denoiser. It is responsible for the core protein structure generation logic.
Related Classes/Methods:
genie/diffusion/diffusion.py(1:1)genie/diffusion/genie.py(1:1)genie/diffusion/schedule.py(1:1)genie/model/model.py(1:1)genie/model/single_feature_net.py(1:1)genie/model/pair_feature_net.py(1:1)genie/model/pair_transform_net.py(1:1)genie/model/structure_net.py(1:1)genie/model/template.py(1:1)genie/model/modules/(1:1)
Training & Generation Engine [Expand]
This component orchestrates the entire lifecycle of model training and the generation of new protein structures. It initializes the model, manages the training loop, and executes the reverse diffusion (sampling) process to create novel structures.
Related Classes/Methods:
Model Persistence [Expand]
This component manages the serialization and deserialization of model checkpoints and associated configurations. It enables the saving of trained models during training and the loading of pre-trained models for generation or evaluation.
Related Classes/Methods:
Evaluation Framework [Expand]
A dedicated pipeline for assessing the quality and properties of generated protein structures. It integrates external tools (e.g., ESMFold, ProteinMPNN) for comprehensive assessment, metric calculation, and reporting.
Related Classes/Methods:
evaluations/pipeline/evaluate.py(1:1)evaluations/pipeline/fold_models/(1:1)evaluations/pipeline/inverse_fold_models/(1:1)