feat(churn): git churn ingest and churn-complexity-hotspots recipe#179
Conversation
Add file_churn substrate with full/incremental/idle refresh on every index pass, codemap ingest-churn for golden seeds, and a refactor-priority recipe ranked by churn × complexity with perf-baseline churn_ms gating.
🦋 Changeset detectedLatest commit: c8f037c The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
Warning Review limit reached
More reviews will be available in 54 minutes and 45 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (48)
📝 WalkthroughWalkthroughThis PR implements a new "churn × complexity hotspots" feature that ranks files and symbols by combining git change frequency (via weighted commits) with cyclomatic complexity. It adds file churn ingestion (git-backed for repositories, JSON-backed for non-git projects), integrates refresh into the index pipeline, and exposes a new recipe for hotspot analysis. ChangesChurn × Complexity Hotspots
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Require populated file_churn and config fingerprint before idle skip; deletions-only index skips git log; ingest-churn needs index + inline schema help; delete shipped plan and fix inbound refs.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (4)
src/config.test.ts (1)
121-136: ⚡ Quick winConsider adding test coverage for
churn.filepath resolution.The current tests cover defaults and CLI override for
churn.since, which is good. However, there's no test verifying thatchurn.fileis resolved to an absolute path (similar to how lines 382-384 insrc/config.tsresolve it viaresolve(absRoot, parsed.churn.file)).🧪 Suggested test case
+ it("resolves churn.file to absolute path", () => { + const r = resolveCodemapConfig(dir, { + churn: { file: "churn-data.json" }, + }); + expect(r.churn.file).toBe(join(dir, "churn-data.json")); + });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/config.test.ts` around lines 121 - 136, Add a test that verifies churn.file is resolved to an absolute path: call resolveCodemapConfig (same helper used in other tests) with a config object containing churn.file set to a relative path (e.g., "relative/path/to/file") and assert that the returned r.churn.file equals the absolute path produced by resolving against the test root (use path.resolve(dir, "relative/path/to/file")); this ensures the resolution logic in resolveCodemapConfig / the churn.file handling is covered.src/cli/cmd-ingest-churn.ts (1)
13-37: ⚡ Quick winMinor documentation inconsistency for
computed_atfield.The help text implies
computed_atis a required field in the input JSON. However,parseChurnJsonPayloadiningest-churn-run.ts(lines 70-73) defaultscomputed_atto the current timestamp when it's not a string, making it effectively optional in the input.Consider clarifying the help text to indicate that
computed_atis optional and will default to the current time if omitted.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cli/cmd-ingest-churn.ts` around lines 13 - 37, Update the help text in printIngestChurnCmdHelp to mark computed_at as optional (it defaults to now when absent) to match parseChurnJsonPayload behavior in ingest-churn-run.ts; edit the usage string line that lists required fields to say "`computed_at` (optional, defaults to current time)" or similar so docs and the parseChurnJsonPayload implementation are consistent.scripts/query-golden-coverage-matrix.test.mjs (1)
78-84: ⚡ Quick winAdd
file_churnassertion to the index-table-stats guardrail test.This test now protects only a subset of expected tables; add a check for
FROM file_churnso regressions in the new churn stats field are caught.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/query-golden-coverage-matrix.test.mjs` around lines 78 - 84, The test "index-table-stats locks fetchTableStats-shaped counts" only asserts presence of FROM file_metrics, FROM unresolved_calls and FROM "references"; add an assertion to also check for FROM file_churn so the guardrail covers the new churn stats field. Locate the test that references scenarioIds and scenarios and update it (the it block named "index-table-stats locks fetchTableStats-shaped counts") to include expect(scenario?.sql).toContain("FROM file_churn") alongside the existing expects.scripts/query-golden/run-setup.ts (1)
45-47: ⚡ Quick winValidate parsed JSON structure for clearer errors.
The JSON is cast to
FileChurnRow[]without validation. If the fixture file is malformed, this could cause cryptic errors downstream whenreplaceFileChurnprocesses the data.Consider adding basic structure validation (array check, required fields) or a Zod schema for
FileChurnRow[]to catch fixture errors early with actionable messages.🛡️ Example validation approach
+ // Basic structure check + if (!Array.isArray(rows) || rows.length === 0) { + throw new Error( + `query-golden setup: ${absPath} must contain a non-empty array`, + ); + } + if (!rows.every(r => typeof r === 'object' && r !== null && 'path' in r)) { + throw new Error( + `query-golden setup: ${absPath} rows must have 'path' field`, + ); + } replaceFileChurn(db, rows);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/query-golden/run-setup.ts` around lines 45 - 47, The parsed JSON is blindly cast to FileChurnRow[] which can produce cryptic failures later in replaceFileChurn; after the readFileSync/JSON.parse step that sets rows, add a validation block (e.g., Array.isArray(rows) plus checks that each item contains required keys like filePath, churn, author, etc.) or wire in a Zod schema for FileChurnRow[] to validate the structure and types, and throw a clear error message identifying the fixture path and the missing/invalid fields if validation fails so replaceFileChurn receives guaranteed well-formed data.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/architecture.md`:
- Around line 501-514: The doc currently contradicts non-git handling for
file_churn; update the paragraph so it states one canonical behavior: clarify
that refreshFileChurn → ingestFileChurnFromGit is skipped for non-git repos (so
the git-based ingest produces an empty table), but non-git repos can still
populate file_churn via the codemap "ingest-churn" JSON ingest or by seeding via
the churn.file config/CLI; mention the relevant symbols refreshFileChurn,
ingestFileChurnFromGit, codemap ingest-churn, and churn.file to make the intent
explicit and use consistent terminology (“git ingest is skipped on non-git
repos; use JSON/config seed to populate”).
In `@templates/recipes/churn-complexity-hotspots.md`:
- Line 35: The documented symbol-grain output columns are inconsistent with the
actual SQL recipe output: update the line describing symbol-grain (when
by_symbol=true) in churn-complexity-hotspots.md to match the real schema by
replacing `name`, `kind`, and `cyclomatic_complexity` with the actual column
names `symbol_name`, `symbol_kind`, `max_complexity`, and `avg_complexity` (and
confirm `line_start` remains if present); ensure the `by_symbol=true` sentence
and the overall Output columns list use the same terminology as the SQL result
set to keep docs and code aligned.
---
Nitpick comments:
In `@scripts/query-golden-coverage-matrix.test.mjs`:
- Around line 78-84: The test "index-table-stats locks fetchTableStats-shaped
counts" only asserts presence of FROM file_metrics, FROM unresolved_calls and
FROM "references"; add an assertion to also check for FROM file_churn so the
guardrail covers the new churn stats field. Locate the test that references
scenarioIds and scenarios and update it (the it block named "index-table-stats
locks fetchTableStats-shaped counts") to include
expect(scenario?.sql).toContain("FROM file_churn") alongside the existing
expects.
In `@scripts/query-golden/run-setup.ts`:
- Around line 45-47: The parsed JSON is blindly cast to FileChurnRow[] which can
produce cryptic failures later in replaceFileChurn; after the
readFileSync/JSON.parse step that sets rows, add a validation block (e.g.,
Array.isArray(rows) plus checks that each item contains required keys like
filePath, churn, author, etc.) or wire in a Zod schema for FileChurnRow[] to
validate the structure and types, and throw a clear error message identifying
the fixture path and the missing/invalid fields if validation fails so
replaceFileChurn receives guaranteed well-formed data.
In `@src/cli/cmd-ingest-churn.ts`:
- Around line 13-37: Update the help text in printIngestChurnCmdHelp to mark
computed_at as optional (it defaults to now when absent) to match
parseChurnJsonPayload behavior in ingest-churn-run.ts; edit the usage string
line that lists required fields to say "`computed_at` (optional, defaults to
current time)" or similar so docs and the parseChurnJsonPayload implementation
are consistent.
In `@src/config.test.ts`:
- Around line 121-136: Add a test that verifies churn.file is resolved to an
absolute path: call resolveCodemapConfig (same helper used in other tests) with
a config object containing churn.file set to a relative path (e.g.,
"relative/path/to/file") and assert that the returned r.churn.file equals the
absolute path produced by resolving against the test root (use path.resolve(dir,
"relative/path/to/file")); this ensures the resolution logic in
resolveCodemapConfig / the churn.file handling is covered.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 257777e7-ca24-4d06-bf65-9795f73713d0
📒 Files selected for processing (49)
.changeset/churn-complexity-hotspots.mddocs/architecture.mddocs/benchmark.mddocs/glossary.mddocs/golden-queries.mddocs/plans/churn-complexity-hotspots.mddocs/plans/substrate-extraction.mddocs/roadmap.mddocs/testing-coverage.mdfixtures/CAPABILITIES.jsonfixtures/benchmark/perf-baseline.jsonfixtures/golden/minimal/churn-complexity-hotspots-by-symbol.jsonfixtures/golden/minimal/churn-complexity-hotspots.jsonfixtures/golden/minimal/files-count.jsonfixtures/golden/minimal/files-hashes.jsonfixtures/golden/minimal/files-largest.jsonfixtures/golden/minimal/index-summary.jsonfixtures/golden/minimal/index-table-stats.jsonfixtures/golden/minimal/source-fts-row-count.jsonfixtures/golden/scenarios.jsonfixtures/minimal/file-churn-seed.jsonscripts/agent-eval/scenarios.jsonscripts/check-perf-baseline.tsscripts/query-golden-coverage-matrix.test.mjsscripts/query-golden/run-setup.tsscripts/query-golden/schema.tssrc/application/churn-ingest.test.tssrc/application/churn-ingest.tssrc/application/context-engine.test.tssrc/application/context-engine.tssrc/application/index-engine.tssrc/application/ingest-churn-run.test.tssrc/application/ingest-churn-run.tssrc/application/run-index.tssrc/application/types.tssrc/cli/bootstrap-codemap.tssrc/cli/bootstrap.tssrc/cli/cmd-index.tssrc/cli/cmd-ingest-churn.tssrc/cli/main.tssrc/config.test.tssrc/config.tssrc/db.tssrc/file-churn.test.tssrc/runtime.tssrc/worker-pool.dist.test.tstemplates/agent-content/rule/00-full.mdtemplates/recipes/churn-complexity-hotspots.mdtemplates/recipes/churn-complexity-hotspots.sql
💤 Files with no reviewable changes (1)
- docs/plans/churn-complexity-hotspots.md
…ctions Wire churn-complexity-hotspots into MCP playbook and recipe chains, add churn column guidance to rule/skill shards, per-row review actions on the recipe, and cross-links from refactor-risk recipes.
Ship post-merge items in PR #179: MCP/HTTP ingest_churn, churn.file git skip, context churn_hint, path_prefix param, churn_idle_ms perf baseline. Fix architecture/recipe doc drift (CodeRabbit), reject empty churn JSON without wiping file_churn, and sweep 21-tool consumer surfaces.
Add path_prefix golden + scope test, incremental churn merge test, churn_idle_ms per-phase noise floor and idle sanity cap, and genericize served skill shard comments.
Do not wipe file_churn on git log failure; fall back to full churn refresh when config fingerprint drifts during incremental index. Run index-table-stats golden before churn seed scenarios (file_churn: 46).
Cover churn.file absolute resolution, mark computed_at optional in ingest-churn help, guard file_churn in index-table-stats matrix, and reuse parseChurnJsonPayload for golden seed validation.
Wrap replaceFileChurn/mergeFileChurnForPaths in transactions; validate churn_trend on JSON ingest; align README, recipe, agent-content, and CLI help for churn.file override and hotspots alias distinction.
Summary
file_churnsubstrate (schema v40) populated on every index pass: full git rebuild, incremental scoped merge for changed paths, and idle skip whenHEADmatchesmeta.churn_indexed_commit.codemap ingest-churn <json>pluschurn.file/churn.since/churn.halfLifeDaysconfig and--churn-sinceCLI for non-git or golden seeding.churn-complexity-hotspotsrecipe (file or symbol grain viaby_symbol) ranking churn × cognitive complexity with normalized 0–100 scores andchurn_trend; wire into agent intent classification and golden/agent-eval fixtures.churn_msin perf baseline (~182ms median on this repo).Test plan
bun run check(format, lint, typecheck, unit, golden, agent-eval)bun run check:perf-baselinewith updatedchurn_msmediancodemap query --recipe churn-complexity-hotspotsfile +by_symbol=truesymbol graincodemap ingest-churn fixtures/minimal/file-churn-seed.jsongolden pathSummary by CodeRabbit
New Features
churn-complexity-hotspotsrecipe to identify frequently-changing, complex code for refactoring.codemap ingest-churncommand to import precomputed file churn metrics from JSON.--churn-sinceCLI flag andchurnconfiguration for filtering git history and tuning metrics.file_churnmetrics tracking to database schema.Documentation