Skip to content

Latest commit

 

History

History
81 lines (51 loc) · 6.59 KB

File metadata and controls

81 lines (51 loc) · 6.59 KB
graph LR
    Data_Management_Module["Data Management Module"]
    EMA_Utility_Module["EMA Utility Module"]
    Core_Model_Module["Core Model Module"]
    Probabilistic_Model_Module["Probabilistic Model Module"]
    Neural_Network_Utilities["Neural Network Utilities"]
    Data_Management_Module -- "Provides data" --> Core_Model_Module
    Data_Management_Module -- "Provides data" --> Probabilistic_Model_Module
    Core_Model_Module -- "Utilizes EMA" --> EMA_Utility_Module
    Core_Model_Module -- "Uses utilities" --> Neural_Network_Utilities
    Probabilistic_Model_Module -- "Extends" --> Core_Model_Module
    Probabilistic_Model_Module -- "Uses utilities" --> Neural_Network_Utilities
    click Data_Management_Module href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main//insitro-research/Data Management Module.md" "Details"
    click EMA_Utility_Module href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main//insitro-research/EMA Utility Module.md" "Details"
    click Core_Model_Module href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main//insitro-research/Core Model Module.md" "Details"
    click Probabilistic_Model_Module href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main//insitro-research/Probabilistic Model Module.md" "Details"
    click Neural_Network_Utilities href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main//insitro-research/Neural Network Utilities.md" "Details"
Loading

CodeBoardingDemoContact

Component Details

This architecture describes the 2022-del-dock project within the insitro-research repository, which focuses on molecular docking and predictive modeling. The core functionality involves managing experimental data, training machine learning models (both deterministic and probabilistic), and utilizing utility functions for neural network construction and training stabilization via Exponential Moving Average (EMA). The primary flow involves data preparation, feeding it into models, and leveraging shared utilities for model development and optimization.

Data Management Module

This module is responsible for handling all aspects of data loading, preprocessing, and splitting for training, validation, and evaluation. It prepares the input data (e.g., molecular fingerprints, CNN features, target counts) for the machine learning models.

Related Classes/Methods:

EMA Utility Module

This module provides the implementation for Exponential Moving Average (EMA), a technique used to maintain a moving average of model parameters. EMA helps stabilize training and often leads to improved generalization performance.

Related Classes/Methods:

Core Model Module

This module defines the fundamental PyTorch Lightning models used for molecular docking and prediction. It includes base classes for integrating EMA, handling training, validation, and testing steps, and logging performance metrics like Pearson and Spearman correlations.

Related Classes/Methods:

Probabilistic Model Module

This module extends the core modeling capabilities by incorporating probabilistic programming using the Pyro library. It defines models that can handle count data and perform variational inference, providing a Bayesian approach to predictions.

Related Classes/Methods:

Neural Network Utilities

This module contains general-purpose utility functions and classes for building neural networks, including various activation functions and a multi-layer perceptron (MLP) with residual connections. These utilities serve as building blocks for the models defined in other modules.

Related Classes/Methods: