feat: extract rationale comments + ADR/RFC doc references from JS/TS#1599
Open
niltonmourafilho-arch wants to merge 1 commit into
Open
Conversation
Parity with _extract_python_rationale: Python files get rationale nodes
from docstrings and '# NOTE:'-style comments, but JS/TS comments were
discarded entirely. This adds a post-pass to extract_js that:
1. extracts rationale comments ('// NOTE:', '// WHY:', block-comment
'* NOTE:' variants) as rationale nodes with rationale_for edges,
matching the Python behavior;
2. first-classes architecture-decision references (ADR-NNNN, RFC NNNN)
found in comments as doc_ref nodes with 'cites' edges from the file.
The doc_ref pass is the natural join point between code and design docs
in mixed corpora: teams conventionally cite ADR ids in file headers, but
today those citations produce no edges, so code<->ADR connections never
form even when the discipline exists. Spellings are normalized
(ADR-11 / ADR 0011 -> ADR-0011) so references to the same document
collapse to one node, and string literals are excluded (comment-shaped
lines only).
Tested on a real mixed corpus (Flutter/Supabase monorepo): router.ts
alone yields 10 ADR citations that previously produced zero edges.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Parity with
_extract_python_rationale: Python files get rationale nodes from docstrings and# NOTE:-style comments, but JS/TS comments were discarded entirely. This PR adds a post-pass toextract_jsthat:// NOTE:,// WHY:,// HACK:etc. (plus block-comment* NOTE:variants) becomerationalenodes withrationale_foredges, matching the existing Python behavior.ADR-NNNN/RFC NNNNcitations found in comments becomedoc_refnodes withcitesedges from the file node.Why
The doc_ref pass is the natural join point between code and design docs in mixed corpora. Teams conventionally cite ADR ids in TS file headers, but today those citations produce zero edges, so code↔ADR connections never form in the graph even when the citation discipline exists in the codebase.
Tested on a real mixed corpus (Flutter/Supabase monorepo, ~163 TS files + 40 ADRs): a single
router.tsyields 10 ADR citations that previously produced no edges. With this patch they become directcitesedges, closing the code↔ADR gap without any LLM cost (pure line scan, same cost profile as the Python rationale pass).Design notes
ADR-11/ADR 0011→ADR-0011) so references to the same document collapse to one node; deduped per file.//,/*,*) are scanned, soconst s = "ADR-0099"produces nothing.Test plan
tests/test_rationale.py(line comment, block comment, multi-ref, normalization/dedup, string-literal exclusion)tests/test_rationale.py18/18 passtests/test_extract.py+tests/test_build.py+tests/test_languages.py: failure list identical before/after patch (5 pre-existing Windows symlink/path failures, unrelated)