Skip to content

fix: derive compaction threshold from context window for large (1M) models#3571

Open
akhil29 wants to merge 4 commits into
tailcallhq:mainfrom
akhil29:test/vertex-opus-compaction-threshold
Open

fix: derive compaction threshold from context window for large (1M) models#3571
akhil29 wants to merge 4 commits into
tailcallhq:mainfrom
akhil29:test/vertex-opus-compaction-threshold

Conversation

@akhil29

@akhil29 akhil29 commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

What

Fixes early-compaction on large-context models (e.g. Claude Opus 4.6/4.7/4.8 with 1M-token windows) and adds extensive bug-catching tests for the Vertex AI Anthropic provider.

The bug

On a 1M-token Opus model with default config, compaction fired at ~100K tokens — roughly 10% of the window — because the default token_threshold (100K) was applied as an absolute cap:

min(token_threshold.unwrap_or(100_000), 0.70 * context_window)
= min(100_000, 700_000) = 100_000

The fix (two parts)

  1. forge_domain/src/agent.rscompaction_threshold()

    • When token_threshold is unset (the realistic default), derive the threshold purely from the context window (70%), so large windows aren't capped to 100K.
    • When explicitly configured, it is still honored as an absolute cap: min(configured, 70% * window).
  2. forge_app/src/dto/anthropic/response.rsget_context_length()

    • The generic claude-opus-4- prefix wrongly captured the 1M-token claude-opus-4-6/4-7/4-8 models at 200K. Added an explicit 1M branch before the generic 200K branch. Older claude-opus-4 / claude-opus-4-1 remain at 200K.

Tests (genuine bug-catchers — red without the fix)

Extensive Vertex AI Opus compaction-threshold coverage across large (1M) and small (200K) windows:

  • Derived-threshold assertions (1M → 700K, 200K → 140K)
  • Below / at / above trigger boundaries (threshold - 1 does not trigger, threshold does)
  • Cross-window guarantee: a 200K-sized context does not compact on a 1M window
  • get_context_length coverage for the Opus 1M models
  • Updated the stale threshold test that encoded the buggy capping behavior

Verified these fail on the unfixed logic and pass with the fix applied.

Verification

  • forge_domain vertex_opus: 7 passed
  • forge_domain compaction_threshold: 6 passed
  • forge_app get_context_length: 8 passed
  • Full forge_domain and forge_app suites: green

@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Akhil Appana seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@akhil29 akhil29 changed the title test: add Vertex AI Opus compaction-threshold boundary tests fix: derive compaction threshold from context window for large (1M) models Jun 25, 2026
@github-actions github-actions Bot added the type: fix Iterations on existing features or infrastructure. label Jun 25, 2026
Akhil Appana and others added 3 commits June 25, 2026 21:12
Add tests covering the compaction trigger for Vertex AI Claude Opus across
a large (1M) and a smaller (200K) context window. Each test derives the
threshold from the model's context window via the production
`compaction_threshold(...)` path (default config, no overrides), then
asserts the `should_compact` token gate around the resolved threshold:
just below → no compaction, at and above → compaction.

This exercises the model -> derived-threshold -> trigger path faithfully
for the vertex_ai_anthropic provider.
…odels

Fix early-compaction on large-context models (e.g. Claude Opus 4.6/4.7/4.8,
1M-token windows) and add extensive bug-catching tests for the Vertex AI
Anthropic provider.

Two related fixes:

1. agent.rs `compaction_threshold()`: the default token_threshold (100K) was
   applied as `min(100K, 70% * context_window)`, which capped a 1M-window
   model to ~10% of its window. Now when `token_threshold` is unset (the
   realistic default) the threshold is derived purely from the context window
   (70%); when explicitly configured it is still treated as an absolute cap.

2. response.rs `get_context_length()`: the generic `claude-opus-4-` prefix
   wrongly captured the 1M-token `claude-opus-4-6/4-7/4-8` models at 200K.
   Add an explicit 1M branch before the generic 200K branch. Older
   `claude-opus-4`/`4-1` remain at 200K.

Tests: extensive Vertex AI Opus compaction-threshold coverage across large
(1M) and small (200K) windows -- derived-threshold assertions, below/at/above
trigger boundaries, and the cross-window guarantee that a 200K context does
not compact on a 1M window. These genuinely catch the bug (red without the
fix). Updated the stale threshold test to reflect the corrected semantics and
added a get_context_length test for the Opus 1M models.
@akhil29 akhil29 force-pushed the test/vertex-opus-compaction-threshold branch from d84fbce to 1114b32 Compare June 25, 2026 21:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type: fix Iterations on existing features or infrastructure.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants