graph LR
CLI_Orchestrator["CLI Orchestrator"]
Core_Pipeline_Executor["Core Pipeline Executor"]
Gallery_Generation_Module["Gallery Generation Module"]
External_Visualization_Exporter["External Visualization Exporter"]
Data_Curation_Manager["Data Curation Manager"]
Clustering_Interface["Clustering Interface"]
CLI_Orchestrator -- "invokes" --> Core_Pipeline_Executor
CLI_Orchestrator -- "invokes" --> Gallery_Generation_Module
CLI_Orchestrator -- "invokes" --> External_Visualization_Exporter
CLI_Orchestrator -- "invokes" --> Data_Curation_Manager
CLI_Orchestrator -- "invokes" --> Clustering_Interface
Clustering_Interface -- "invokes" --> Core_Pipeline_Executor
The CLI & Public API subsystem encompasses the top-level functions within the fastdup package, primarily defined in fastdup/engine.py. These functions serve as the direct interface for users, enabling interaction with the fastdup core engine through both command-line commands and programmatic API calls.
The primary command-line interface entry point. It parses user arguments, validates inputs, and dispatches control to other high-level fastdup functions based on the specified command.
Related Classes/Methods:
Executes the main fastdup data processing and analysis pipeline. This function is the primary programmatic interface to the core engine, orchestrating the underlying computations.
Related Classes/Methods:
A collection of functions responsible for generating various interactive HTML galleries (duplicates, outliers, components, statistics, similarity) to visualize fastdup's analysis results.
Related Classes/Methods:
fastdup.__init__.create_duplicates_gallery:1084-1166fastdup.__init__.create_outliers_gallery:1258-1334fastdup.__init__.create_components_gallery:1336-1437fastdup.__init__.create_stats_gallery:2308-2405fastdup.__init__.create_similarity_gallery:2407-2483
Prepares and exports fastdup's feature vectors and metadata into a format compatible with external visualization tools like TensorBoard Projector, enabling deeper exploration of embeddings.
Related Classes/Methods:
Provides functionality to manage and modify identified components (e.g., duplicate clusters, outlier groups) within the dataset or fastdup's internal representation, supporting data curation workflows.
Related Classes/Methods:
Initiates and manages the KMeans clustering process on image embeddings, providing a high-level API for grouping similar images.
Related Classes/Methods: