Skip to content

Add CCA import-based pre-filter, cycle detection, and log lazy formatting #261

Open
tmihalac wants to merge 10 commits into
RHEcosystemAppEng:mainfrom
tmihalac:CCA-Argument-Count-Pre-filter
Open

Add CCA import-based pre-filter, cycle detection, and log lazy formatting #261
tmihalac wants to merge 10 commits into
RHEcosystemAppEng:mainfrom
tmihalac:CCA-Argument-Count-Pre-filter

Conversation

@tmihalac

Copy link
Copy Markdown
Collaborator
  • Add _can_reference_class() to JavaChainOfCallsRetriever for 4-way
    import visibility check (simple class name, wildcard import, same
    package, same artifact)
  • Apply import pre-filter in _get_possible_docs via optional
    declaring_fqcn/callee_file_name/code_documents params
  • Pass declaring FQCN from __find_caller_function to
    _get_possible_docs to eliminate irrelevant uber-JAR candidates before
    expensive type resolution
  • Only filter third-party candidates; application code (root docs)
    always passes to avoid false negatives from polymorphic interface
    calls
  • Add DFS cycle detection guard in get_relevant_documents to prevent
    infinite loops from self-recursive or mutually recursive method calls
  • Switch logger.debug in
    __check_identifier_resolved_to_callee_function_package from f-strings
    to %s lazy formatting to avoid string construction when debug
    logging is disabled

tmihalac added 3 commits June 23, 2026 17:43
- Replaced esprima-based JavaScript segmenter with tree-sitter for reliable
  parsing of modern JS syntax (optional chaining, nullish coalescing, top-level
  await)
  - Fixed JS function name extraction: keyword filtering, position-aware matching,
  redundant pattern removal, generator/TypeScript/anonymous-export support
  - Added build-artifact filtering (should_skip) that excludes app-level dist/,
  build/static/, .min.js while preserving node_modules/*/dist/ as legitimate
  third-party source
  - Added empty-name guards in CCA BFS to prevent documents with unextractable
  function names from entering call-chain analysis
  - Fixed _get_function_calls regex to detect calls through optional chaining
  (obj?.method())

Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
  formatting

  - Add _can_reference_class() to JavaChainOfCallsRetriever for 4-way
  import visibility check (simple class name, wildcard import, same
  package, same artifact)
  - Apply import pre-filter in _get_possible_docs via optional
  declaring_fqcn/callee_file_name/code_documents params
  - Pass declaring FQCN from __find_caller_function to
  _get_possible_docs to eliminate irrelevant uber-JAR candidates before
  expensive type resolution
  - Only filter third-party candidates; application code (root docs)
  always passes to avoid false negatives from polymorphic interface
  calls
  - Add DFS cycle detection guard in get_relevant_documents to prevent
  infinite loops from self-recursive or mutually recursive method calls
  - Switch logger.debug in
  __check_identifier_resolved_to_callee_function_package from f-strings
  to %s lazy formatting to avoid string construction when debug
  logging is disabled

Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
@vbelouso

vbelouso commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Code Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@tmihalac

Copy link
Copy Markdown
Collaborator Author

/test-heavy

1 similar comment
@tmihalac

Copy link
Copy Markdown
Collaborator Author

/test-heavy

  escaping, and CCA empty-result guidance

  - Deduplicate parents list in non-Java CCA tree_dict to prevent
  duplicate entries from dependency tree builder
  - Add Go subpackage prefix matching to Rule 8 so
  "github.com/lib/foo/bar" matches target "github.com/lib/foo"
  - Add distinct "function not found" message when CCA returns empty
  call_hierarchy_list so agent distinguishes missing function from
  unreachable function
  - Escape regex metacharacters in Function Caller Finder query builder
  to handle identifiers containing dots, brackets, and plus sign

Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
@tmihalac

Copy link
Copy Markdown
Collaborator Author

/test-heavy

1 similar comment
@tmihalac

Copy link
Copy Markdown
Collaborator Author

/test-heavy

tmihalac added 6 commits June 25, 2026 23:19
  - Fix dep_tree.py missing comma causing "--pythonvenv_python" string
  concatenation
  - Fix dep_tree.py C/C++ detect_ecosystem walk not using
  _WALK_EXCLUDE_DIRS
  - Fix c_segmenter_custom.py remove_comments stripping patterns inside
  string literals
  - Fix c_lang_function_parsers.py debug print statement left in
  production code
  - Fix golang_functions_parsers.py `len(declaration_parts) == (2 or
  3)` always-true comparison
  - Fix golang_functions_parsers.py no-op `re.search("")` call
  - Fix golang_functions_parsers.py is_package_imported raw string
  split and missing quote stripping
  - Fix golang_functions_parsers.py is_same_package crash on empty
  input
  - Fix javascript_functions_parser.py is_comment_line missing block
  comment continuation (`*`)
  - Fix javascript_functions_parser.py _extract_class_name regex
  missing `$` in identifier class
  - Fix javascript_functions_parser.py _parse_declarations unused
  is_multiline parameter
  - Fix javascript_functions_parser.py backreference `[^\1]` in string
  pattern
  - Fix python_functions_parser.py is_same_package returning True for
  two empty strings
  - Fix python_segmenters_with_classes_methods.py annotating all
  methods with last class name
  - Fix python_segmenters_with_classes_methods.py skipping async def
  methods
  - Fix source_code_git_loader.py safe.directory guard unnecessarily
  gated on clone_url
  - Fix brew_downloader.py returning path even with zero downloads
  - Fix brew_downloader.py extracting SRPM from cache path instead of
  target path
  - Fix configuration_scanner.py re.match allowing partial filename
  matches
  - Fix configuration_scanner.py max_results=0 returning 1 result
  - Fix configuration_scanner.py cache race condition on repo_key check
  outside lock
  - Fix configuration_scanner.py missing docker-compose*.yaml pattern
  - Fix import_usage_analyzer.py empty short_name matching everything
  - Fix async_http_utils.py off-by-one in retry count (`<=` vs `<`)
  - Fix async_http_utils.py consumer errors caught by retry loop
  instead of propagating
  - Fix async_http_utils.py retry_on_client_errors overridden by
  Retry-After check
  - Fix async_http_utils.py negative sleep from X-RateLimit-Reset in
  the past
  - Fix async_http_utils.py missing @functools.wraps on retry_async
  wrapper
  - Fix function_name_locator.py python_flow_control crash on
  non-function documents
  - Fix function_name_locator.py Go versioned module short-name
  collision (v2, v3)
  - Fix git_commit_searcher.py _rank_results mutating confidence
  in-place
  - Fix git_repo_manager.py double-wrapping GitCommandError
  - Fix intel_utils.py parse_cpe checking split_cpe[5] instead of
  split_cpe[10] for system
  - Fix llm_engine_utils.py assert False in production code replaced
  with RuntimeError
  - Fix repo_resolver.py case-sensitivity bug in normalize_package_name
  - Fix serp_api_wrapper.py key index not reset after full rotation
  - Fix serp_api_wrapper.py dead max_retries field
  - Fix csaf_generator.py GHSA description dropped when no pre-existing
  note
  - Fix csaf_generator.py notes appended with text: None when
  summary/justification missing
  - Fix web_patch_fetcher.py missing asyncio import for TimeoutError
  catch
  - Fix web_patch_fetcher.py _is_commit_url false positive on /c/
  outside kernel.org
  - Fix web_patch_fetcher.py dropping Gitiles commit URLs from
  candidates
  - Fix prompting.py build_tool_descriptions missing FL, CONFIG, IUA,
  GREP entries

  Test correctness fixes:
  - Replace tautological assertions (disjunctive or, truthiness-only,
  conditional if-then-assert)
  - Rewrite tests that reimplemented source logic instead of calling
  real functions
  - Fix mock searcher ignoring tantivy query parameter in IUA tests
  - Fix test_stub_only_triggers_pypi_fetch swallowing all exceptions
  via try/except pass
  - Fix test_clone_failure_cleans_temp_dir vacuously-passing assertion
  - Fix test_consumer_error_propagates using overly broad
  pytest.raises(Exception)
  - Fix test_optional_chaining_preservation asserting on input string
  not parsed output
  - Fix test_remove_comments_string_literal wrong docstring and
  tautological assertion
  - Fix test_key_rotation not verifying actual key sent in HTTP request
  - Fix test_all_tools_produce_7_descriptions omitting FL, CONFIG, IUA,
  GREP
  - Fix conditional assertion in git_commit_searcher silently passing
  on None
  - Fix test_third_party_docs weak assertion not verifying actual jar
  key
  - Fix test_llm_engine_utils disjunctive or assertion masking wrong
  return value

  Agent/pipeline coverage:
  - Add pre_process_node tests for ReachabilityAgent and
  CodeUnderstandingAgent
  - Add _postprocess_results exception handling tests
  - Add dispatch_question exception fallback and build_routing_prompt
  integration tests
  - Add Rule 8 vs Rule 9 priority interaction test
  - Add thought_node actions-is-None and observation_node
  truncation/pruning tests
  - Add _build_tool_guidance_for_ecosystem per-ecosystem filtering
  tests

  Java CCA coverage:
  - Add function_called_from_caller_body tests (24 cases)
  - Add extract_from_query, infer_class_name_and_package_name tests
  - Add is_java_fqcn, extract_maven_artifact, _is_doc_excluded tests
  - Add __find_caller_function and __find_initial_function direct tests

  JS parser/segmenter coverage:
  - Add search_for_called_function branch tests
  - Add is_valid, create_map_of_local_vars, is_exported_function tests
  - Add _get_tree caching, should_skip, nested class extraction tests

  C segmenter coverage:
  - Add find_top_level_blocks, remove_macro_blocks,
  extract_define_functions tests

  Go/Python/C parser coverage:
  - Add is_tree_key_match, get_function_name, is_package_imported edge
  case tests
  - Add Python utility method and class-without-parens tests
  - Add C get_package_names, filter_docs, document_imports_package
  tests

  Tools coverage:
  - Add FL stdlib_cache, flow_control, singleton isolation tests
  - Add config scanner cache eviction and concurrent access tests
  - Add IUA query verification and comment-line counting tests
  - Add git_commit_searcher _fetch_patch_via_http tests

  External integration coverage:
  - Add web patch fetcher parsing, Gitiles URL, commit extraction tests
  - Add async HTTP retry limit, raise_for_status, 500 boundary tests
  - Add SERP key exhaustion reset and error propagation tests
  - Add git_repo_manager clone, fetch, concurrency, host validation
  tests

  VEX/intel/version coverage:
  - Add unexpected justification_label, RPM+NVD range, version check
  error tests
  - Add package identifier utility method tests
  - Add _is_safe_url, identify() with intel=None tests

  LLM engine/checklist/prompting coverage:
  - Add preprocess_engine_input, postprocess_engine_output branch tests
  - Add build_no_vuln_packages_output justification tests
  - Add generate_checklist, build_tool_descriptions per-tool tests

  Remaining coverage:
  - Add _ensure_venv, determine_python_version,
  vulnerability_intel_sanitizer tests
  - Add source_classification, credential_client, transitive_detection
  tests
  - Rewrite cve_fetch_patches tests to call real _arun

  Test file consolidation:
  - Merge 18 deleted test files into consolidated per-domain test
  modules
  - Add 13 new focused test files for previously uncovered modules

Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
  - Fix serp_api_wrapper.py callers passing removed max_retries field
  (Pydantic ValidationError at runtime)
  - Fix web_patch_fetcher.py _fetch_gitiles_patch yarl double-encoding
  %5E%21 (use yarl.URL(encoded=True))
  - Fix java_functions_parsers.py _count_call_args treating <
  comparison as generic bracket (dual-comma fallback)
  - Fix repo_resolver.py normalize_package_name dropping original case
  for mixed-case JSON keys (NetworkManager)
  - Fix javascript_functions_parser.py is_comment_line classifying
  generator *method() as block comment
  - Fix golang_functions_parsers.py is_package_imported unescaped
  identifier in regex (add re.escape)
  - Fix configuration_scanner.py cache read outside lock causing
  KeyError on concurrent eviction (use .get())

  Convention fixes:
  - Fix test_java_cca.py _extract docstring referencing
  search_for_called_function
  - Fix test_go_parser.py docstrings referencing fix history instead of
  describing behavior

  Tests:
  - Add SerpAPI extra_forbidden validation test
  - Add Gitiles yarl.URL encoding preservation tests
  - Add _count_call_args unbalanced angle bracket tests (comparison,
  bit shift, ternary)
  - Add normalize_package_name mixed-case preservation tests
  - Add is_comment_line generator method vs block comment tests
  - Add is_package_imported regex escape and substring rejection tests
  - Add configuration scanner cache eviction safety tests

Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
Signed-off-by: Theodor Mihalache <tmihalac@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants