Add CCA import-based pre-filter, cycle detection, and log lazy formatting #261
Open
tmihalac wants to merge 10 commits into
Open
Add CCA import-based pre-filter, cycle detection, and log lazy formatting #261tmihalac wants to merge 10 commits into
tmihalac wants to merge 10 commits into
Conversation
- Replaced esprima-based JavaScript segmenter with tree-sitter for reliable parsing of modern JS syntax (optional chaining, nullish coalescing, top-level await) - Fixed JS function name extraction: keyword filtering, position-aware matching, redundant pattern removal, generator/TypeScript/anonymous-export support - Added build-artifact filtering (should_skip) that excludes app-level dist/, build/static/, .min.js while preserving node_modules/*/dist/ as legitimate third-party source - Added empty-name guards in CCA BFS to prevent documents with unextractable function names from entering call-chain analysis - Fixed _get_function_calls regex to detect calls through optional chaining (obj?.method()) Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
formatting - Add _can_reference_class() to JavaChainOfCallsRetriever for 4-way import visibility check (simple class name, wildcard import, same package, same artifact) - Apply import pre-filter in _get_possible_docs via optional declaring_fqcn/callee_file_name/code_documents params - Pass declaring FQCN from __find_caller_function to _get_possible_docs to eliminate irrelevant uber-JAR candidates before expensive type resolution - Only filter third-party candidates; application code (root docs) always passes to avoid false negatives from polymorphic interface calls - Add DFS cycle detection guard in get_relevant_documents to prevent infinite loops from self-recursive or mutually recursive method calls - Switch logger.debug in __check_identifier_resolved_to_callee_function_package from f-strings to %s lazy formatting to avoid string construction when debug logging is disabled Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
Collaborator
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
Collaborator
Author
|
/test-heavy |
1 similar comment
Collaborator
Author
|
/test-heavy |
escaping, and CCA empty-result guidance - Deduplicate parents list in non-Java CCA tree_dict to prevent duplicate entries from dependency tree builder - Add Go subpackage prefix matching to Rule 8 so "github.com/lib/foo/bar" matches target "github.com/lib/foo" - Add distinct "function not found" message when CCA returns empty call_hierarchy_list so agent distinguishes missing function from unreachable function - Escape regex metacharacters in Function Caller Finder query builder to handle identifiers containing dots, brackets, and plus sign Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
Collaborator
Author
|
/test-heavy |
1 similar comment
Collaborator
Author
|
/test-heavy |
- Fix dep_tree.py missing comma causing "--pythonvenv_python" string
concatenation
- Fix dep_tree.py C/C++ detect_ecosystem walk not using
_WALK_EXCLUDE_DIRS
- Fix c_segmenter_custom.py remove_comments stripping patterns inside
string literals
- Fix c_lang_function_parsers.py debug print statement left in
production code
- Fix golang_functions_parsers.py `len(declaration_parts) == (2 or
3)` always-true comparison
- Fix golang_functions_parsers.py no-op `re.search("")` call
- Fix golang_functions_parsers.py is_package_imported raw string
split and missing quote stripping
- Fix golang_functions_parsers.py is_same_package crash on empty
input
- Fix javascript_functions_parser.py is_comment_line missing block
comment continuation (`*`)
- Fix javascript_functions_parser.py _extract_class_name regex
missing `$` in identifier class
- Fix javascript_functions_parser.py _parse_declarations unused
is_multiline parameter
- Fix javascript_functions_parser.py backreference `[^\1]` in string
pattern
- Fix python_functions_parser.py is_same_package returning True for
two empty strings
- Fix python_segmenters_with_classes_methods.py annotating all
methods with last class name
- Fix python_segmenters_with_classes_methods.py skipping async def
methods
- Fix source_code_git_loader.py safe.directory guard unnecessarily
gated on clone_url
- Fix brew_downloader.py returning path even with zero downloads
- Fix brew_downloader.py extracting SRPM from cache path instead of
target path
- Fix configuration_scanner.py re.match allowing partial filename
matches
- Fix configuration_scanner.py max_results=0 returning 1 result
- Fix configuration_scanner.py cache race condition on repo_key check
outside lock
- Fix configuration_scanner.py missing docker-compose*.yaml pattern
- Fix import_usage_analyzer.py empty short_name matching everything
- Fix async_http_utils.py off-by-one in retry count (`<=` vs `<`)
- Fix async_http_utils.py consumer errors caught by retry loop
instead of propagating
- Fix async_http_utils.py retry_on_client_errors overridden by
Retry-After check
- Fix async_http_utils.py negative sleep from X-RateLimit-Reset in
the past
- Fix async_http_utils.py missing @functools.wraps on retry_async
wrapper
- Fix function_name_locator.py python_flow_control crash on
non-function documents
- Fix function_name_locator.py Go versioned module short-name
collision (v2, v3)
- Fix git_commit_searcher.py _rank_results mutating confidence
in-place
- Fix git_repo_manager.py double-wrapping GitCommandError
- Fix intel_utils.py parse_cpe checking split_cpe[5] instead of
split_cpe[10] for system
- Fix llm_engine_utils.py assert False in production code replaced
with RuntimeError
- Fix repo_resolver.py case-sensitivity bug in normalize_package_name
- Fix serp_api_wrapper.py key index not reset after full rotation
- Fix serp_api_wrapper.py dead max_retries field
- Fix csaf_generator.py GHSA description dropped when no pre-existing
note
- Fix csaf_generator.py notes appended with text: None when
summary/justification missing
- Fix web_patch_fetcher.py missing asyncio import for TimeoutError
catch
- Fix web_patch_fetcher.py _is_commit_url false positive on /c/
outside kernel.org
- Fix web_patch_fetcher.py dropping Gitiles commit URLs from
candidates
- Fix prompting.py build_tool_descriptions missing FL, CONFIG, IUA,
GREP entries
Test correctness fixes:
- Replace tautological assertions (disjunctive or, truthiness-only,
conditional if-then-assert)
- Rewrite tests that reimplemented source logic instead of calling
real functions
- Fix mock searcher ignoring tantivy query parameter in IUA tests
- Fix test_stub_only_triggers_pypi_fetch swallowing all exceptions
via try/except pass
- Fix test_clone_failure_cleans_temp_dir vacuously-passing assertion
- Fix test_consumer_error_propagates using overly broad
pytest.raises(Exception)
- Fix test_optional_chaining_preservation asserting on input string
not parsed output
- Fix test_remove_comments_string_literal wrong docstring and
tautological assertion
- Fix test_key_rotation not verifying actual key sent in HTTP request
- Fix test_all_tools_produce_7_descriptions omitting FL, CONFIG, IUA,
GREP
- Fix conditional assertion in git_commit_searcher silently passing
on None
- Fix test_third_party_docs weak assertion not verifying actual jar
key
- Fix test_llm_engine_utils disjunctive or assertion masking wrong
return value
Agent/pipeline coverage:
- Add pre_process_node tests for ReachabilityAgent and
CodeUnderstandingAgent
- Add _postprocess_results exception handling tests
- Add dispatch_question exception fallback and build_routing_prompt
integration tests
- Add Rule 8 vs Rule 9 priority interaction test
- Add thought_node actions-is-None and observation_node
truncation/pruning tests
- Add _build_tool_guidance_for_ecosystem per-ecosystem filtering
tests
Java CCA coverage:
- Add function_called_from_caller_body tests (24 cases)
- Add extract_from_query, infer_class_name_and_package_name tests
- Add is_java_fqcn, extract_maven_artifact, _is_doc_excluded tests
- Add __find_caller_function and __find_initial_function direct tests
JS parser/segmenter coverage:
- Add search_for_called_function branch tests
- Add is_valid, create_map_of_local_vars, is_exported_function tests
- Add _get_tree caching, should_skip, nested class extraction tests
C segmenter coverage:
- Add find_top_level_blocks, remove_macro_blocks,
extract_define_functions tests
Go/Python/C parser coverage:
- Add is_tree_key_match, get_function_name, is_package_imported edge
case tests
- Add Python utility method and class-without-parens tests
- Add C get_package_names, filter_docs, document_imports_package
tests
Tools coverage:
- Add FL stdlib_cache, flow_control, singleton isolation tests
- Add config scanner cache eviction and concurrent access tests
- Add IUA query verification and comment-line counting tests
- Add git_commit_searcher _fetch_patch_via_http tests
External integration coverage:
- Add web patch fetcher parsing, Gitiles URL, commit extraction tests
- Add async HTTP retry limit, raise_for_status, 500 boundary tests
- Add SERP key exhaustion reset and error propagation tests
- Add git_repo_manager clone, fetch, concurrency, host validation
tests
VEX/intel/version coverage:
- Add unexpected justification_label, RPM+NVD range, version check
error tests
- Add package identifier utility method tests
- Add _is_safe_url, identify() with intel=None tests
LLM engine/checklist/prompting coverage:
- Add preprocess_engine_input, postprocess_engine_output branch tests
- Add build_no_vuln_packages_output justification tests
- Add generate_checklist, build_tool_descriptions per-tool tests
Remaining coverage:
- Add _ensure_venv, determine_python_version,
vulnerability_intel_sanitizer tests
- Add source_classification, credential_client, transitive_detection
tests
- Rewrite cve_fetch_patches tests to call real _arun
Test file consolidation:
- Merge 18 deleted test files into consolidated per-domain test
modules
- Add 13 new focused test files for previously uncovered modules
Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
- Fix serp_api_wrapper.py callers passing removed max_retries field (Pydantic ValidationError at runtime) - Fix web_patch_fetcher.py _fetch_gitiles_patch yarl double-encoding %5E%21 (use yarl.URL(encoded=True)) - Fix java_functions_parsers.py _count_call_args treating < comparison as generic bracket (dual-comma fallback) - Fix repo_resolver.py normalize_package_name dropping original case for mixed-case JSON keys (NetworkManager) - Fix javascript_functions_parser.py is_comment_line classifying generator *method() as block comment - Fix golang_functions_parsers.py is_package_imported unescaped identifier in regex (add re.escape) - Fix configuration_scanner.py cache read outside lock causing KeyError on concurrent eviction (use .get()) Convention fixes: - Fix test_java_cca.py _extract docstring referencing search_for_called_function - Fix test_go_parser.py docstrings referencing fix history instead of describing behavior Tests: - Add SerpAPI extra_forbidden validation test - Add Gitiles yarl.URL encoding preservation tests - Add _count_call_args unbalanced angle bracket tests (comparison, bit shift, ternary) - Add normalize_package_name mixed-case preservation tests - Add is_comment_line generator method vs block comment tests - Add is_package_imported regex escape and substring rejection tests - Add configuration scanner cache eviction safety tests Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
import visibility check (simple class name, wildcard import, same
package, same artifact)
declaring_fqcn/callee_file_name/code_documents params
_get_possible_docs to eliminate irrelevant uber-JAR candidates before
expensive type resolution
always passes to avoid false negatives from polymorphic interface
calls
infinite loops from self-recursive or mutually recursive method calls
__check_identifier_resolved_to_callee_function_package from f-strings
to %s lazy formatting to avoid string construction when debug
logging is disabled