graph LR
fastdup_fastdup_controller["fastdup.fastdup_controller"]
fastdup_embeddings_timm["fastdup.embeddings_timm"]
fastdup_captions["fastdup.captions"]
fastdup_models_grounding_dino["fastdup.models_grounding_dino"]
fastdup_models_ram["fastdup.models_ram"]
fastdup_models_sam["fastdup.models_sam"]
fastdup_models_tag2text["fastdup.models_tag2text"]
fastdup_models_utils["fastdup.models_utils"]
fastdup_fastdup_controller -- "orchestrates" --> fastdup_captions
fastdup_fastdup_controller -- "orchestrates" --> fastdup_embeddings_timm
fastdup_fastdup_controller -- "orchestrates" --> fastdup_models_grounding_dino
fastdup_fastdup_controller -- "orchestrates" --> fastdup_models_ram
fastdup_fastdup_controller -- "orchestrates" --> fastdup_models_sam
fastdup_fastdup_controller -- "orchestrates" --> fastdup_models_tag2text
fastdup_models_grounding_dino -- "utilizes" --> fastdup_models_utils
fastdup_models_ram -- "utilizes" --> fastdup_models_utils
fastdup_models_sam -- "utilizes" --> fastdup_models_utils
fastdup_models_tag2text -- "utilizes" --> fastdup_models_utils
fastdup_captions -- "utilizes" --> fastdup_models_utils
fastdup_embeddings_timm -- "utilizes" --> fastdup_models_utils
The fastdup project's core ML subsystem is designed around a central orchestration pattern. The fastdup_controller acts as the primary entry point for initiating various image analysis tasks. It delegates specific responsibilities to specialized model components such as embeddings_timm for generating image embeddings, captions for text descriptions, models_grounding_dino for object detection, models_ram for general image recognition, models_sam for segmentation, and models_tag2text for tag-to-text generation. A common models_utils component provides shared functionalities like post-processing and data formatting, ensuring consistency across different model outputs. This modular design allows for easy integration of new models and clear separation of concerns.
Acts as the primary orchestrator for ML-related tasks. It initiates and coordinates calls to various model-specific components based on the required analysis (e.g., captioning, embedding generation, object detection). This component is fundamental as it provides the central control flow for all ML operations.
Related Classes/Methods:
Handles the initialization, loading, and inference of TIMM (PyTorch Image Models) models to produce numerical image embeddings, which are crucial for similarity search and other downstream tasks. This is a core data analysis capability.
Related Classes/Methods:
Manages the generation of descriptive text captions for images using a dedicated captioning model. This component encompasses all image captioning functionalities, providing rich textual metadata for images.
Related Classes/Methods:
Encapsulates the Grounding DINO model for performing zero-shot object detection, allowing the system to identify and locate objects based on text prompts, and extract bounding box information. This is a key computer vision capability.
Related Classes/Methods:
Manages the loading and inference of the Recognize Anything Model (RAM) for general image recognition and tagging. This adds broad image understanding capabilities.
Related Classes/Methods:
Handles the Segment Anything Model (SAM) for precise image segmentation tasks, generating masks for objects within images. This provides detailed object boundary information.
Related Classes/Methods:
Manages the loading and inference of a Tag2Text model for generating descriptive tags or text for images. This provides additional textual metadata for image analysis.
Related Classes/Methods:
Provides common utility functions for post-processing model outputs, visualization, and data formatting, ensuring consistency across different ML model integrations. This component is crucial for standardizing outputs and supporting downstream analysis.
Related Classes/Methods: