graph LR
Crawl_Orchestrator["Crawl Orchestrator"]
Configuration_Initialization_Manager["Configuration & Initialization Manager"]
Crawl_State_Coverage_Manager["Crawl State & Coverage Manager"]
Confidence_Quality_Assessor["Confidence & Quality Assessor"]
Link_Ranking_Selection_Engine["Link Ranking & Selection Engine"]
Semantic_Analysis_Unit["Semantic Analysis Unit"]
Knowledge_Base_Persistence["Knowledge Base Persistence"]
Reporting_Statistics_Generator["Reporting & Statistics Generator"]
Configuration_Initialization_Manager -- "initializes" --> Crawl_Orchestrator
Configuration_Initialization_Manager -- "configures" --> Semantic_Analysis_Unit
Configuration_Initialization_Manager -- "configures" --> Link_Ranking_Selection_Engine
Crawl_Orchestrator -- "updates state in" --> Crawl_State_Coverage_Manager
Crawl_Orchestrator -- "retrieves coverage from" --> Crawl_State_Coverage_Manager
Crawl_Orchestrator -- "requests quality confidence from" --> Confidence_Quality_Assessor
Crawl_Orchestrator -- "requests ranked links from" --> Link_Ranking_Selection_Engine
Crawl_Orchestrator -- "utilizes" --> Semantic_Analysis_Unit
Crawl_Orchestrator -- "manages persistence via" --> Knowledge_Base_Persistence
Crawl_Orchestrator -- "provides data to" --> Reporting_Statistics_Generator
Crawl_State_Coverage_Manager -- "provides content data to" --> Confidence_Quality_Assessor
Crawl_State_Coverage_Manager -- "provides content data to" --> Link_Ranking_Selection_Engine
Confidence_Quality_Assessor -- "provides quality metrics to" --> Reporting_Statistics_Generator
Link_Ranking_Selection_Engine -- "requests semantic analysis from" --> Semantic_Analysis_Unit
The AdaptiveCrawler subsystem is designed around a central Crawl Orchestrator that manages an iterative, adaptive crawling process. The Configuration & Initialization Manager sets up the system, including the Crawl Orchestrator and its key dependencies. During execution, the Crawl Orchestrator continuously interacts with the Crawl State & Coverage Manager to update and retrieve the current crawl state and coverage metrics. For intelligent link prioritization, the Crawl Orchestrator requests ranked links from the Link Ranking & Selection Engine, which in turn leverages the Crawl State & Coverage Manager for content data and the Semantic Analysis Unit for deep semantic understanding. The Semantic Analysis Unit is also directly utilized by the Crawl Orchestrator for mapping queries. To assess the quality and completeness of the crawl, the Crawl Orchestrator queries the Confidence & Quality Assessor, which relies on the Crawl State & Coverage Manager for raw content data. All crawl results and learned knowledge are managed for persistence by the Knowledge Base Persistence component, orchestrated by the Crawl Orchestrator. Finally, the Reporting & Statistics Generator provides insights into the crawl's performance, receiving data from both the Crawl Orchestrator and the Confidence & Quality Assessor.
Orchestrates the entire adaptive crawling process, managing the flow between different stages, including batch processing, state updates, and stopping conditions. This acts as the central control for the adaptive loop.
Related Classes/Methods:
crawl4ai.adaptive_crawler.AdaptiveCrawler.digestcrawl4ai.adaptive_crawler.AdaptiveCrawler._crawl_batchcrawl4ai.adaptive_crawler.AdaptiveCrawler._crawl_with_previewcrawl4ai.adaptive_crawler.AdaptiveCrawler.should_stop
Manages the setup and validation of adaptive strategies for the crawler.
Related Classes/Methods:
crawl4ai.adaptive_crawler.AdaptiveCrawler.__init__crawl4ai.adaptive_crawler.AdaptiveCrawler._create_strategy
Updates and maintains the internal state of the crawler, including tokenizing content and computing the shape of the coverage achieved.
Related Classes/Methods:
crawl4ai.adaptive_crawler.AdaptiveCrawler.update_statecrawl4ai.adaptive_crawler.AdaptiveCrawler._tokenizecrawl4ai.adaptive_crawler.AdaptiveCrawler.compute_coverage_shape
Calculates various confidence metrics (consistency, saturation, coverage) to assess the quality and completeness of the crawled data, crucial for adaptive adjustments.
Related Classes/Methods:
crawl4ai.adaptive_crawler.AdaptiveCrawler.calculate_confidencecrawl4ai.adaptive_crawler.AdaptiveCrawler._calculate_consistencycrawl4ai.adaptive_crawler.AdaptiveCrawler._calculate_saturationcrawl4ai.adaptive_crawler.AdaptiveCrawler._calculate_coveragecrawl4ai.adaptive_crawler.AdaptiveCrawler._get_document_termscrawl4ai.adaptive_crawler.AdaptiveCrawler.get_quality_confidencecrawl4ai.adaptive_crawler.AdaptiveCrawler.confidence
Prioritizes and selects links for further exploration based on relevance, novelty, and identified coverage gaps, driving the adaptive nature of the crawl.
Related Classes/Methods:
crawl4ai.adaptive_crawler.AdaptiveCrawler.rank_linkscrawl4ai.adaptive_crawler.AdaptiveCrawler._calculate_relevancecrawl4ai.adaptive_crawler.AdaptiveCrawler.find_coverage_gapscrawl4ai.adaptive_crawler.AdaptiveCrawler.select_links_for_expansioncrawl4ai.adaptive_crawler.AdaptiveCrawler._calculate_novelty
Maps queries and documents into a semantic space using embeddings, facilitating advanced analysis like distance matrix computation for coverage and relevance, which is vital for AI/LLM integration.
Related Classes/Methods:
crawl4ai.adaptive_crawler.AdaptiveCrawler.map_query_semantic_spacecrawl4ai.adaptive_crawler.AdaptiveCrawler._get_embeddingscrawl4ai.adaptive_crawler.AdaptiveCrawler._get_cached_distance_matrixcrawl4ai.adaptive_crawler.AdaptiveCrawler._compute_distance_matrixcrawl4ai.adaptive_crawler.AdaptiveCrawler.validate_coverage
Manages the saving, loading, importing, and exporting of crawl results and the accumulated knowledge base, ensuring the persistence of learned information for adaptive strategies.
Related Classes/Methods:
crawl4ai.adaptive_crawler.AdaptiveCrawler.savecrawl4ai.adaptive_crawler.AdaptiveCrawler._crawl_result_to_dictcrawl4ai.adaptive_crawler.AdaptiveCrawler.loadcrawl4ai.adaptive_crawler.AdaptiveCrawler._dict_to_crawl_resultcrawl4ai.adaptive_crawler.AdaptiveCrawler.export_knowledge_basecrawl4ai.adaptive_crawler.AdaptiveCrawler._crawl_result_to_export_dictcrawl4ai.adaptive_crawler.AdaptiveCrawler.import_knowledge_basecrawl4ai.adaptive_crawler.AdaptiveCrawler._import_dict_to_crawl_result
Provides insights into the crawl progress, coverage, and overall quality through various statistics and reports, aiding in monitoring and evaluating adaptive performance.
Related Classes/Methods: