fix: derive compaction threshold from context window for large (1M) models#3571
Open
akhil29 wants to merge 4 commits into
Open
fix: derive compaction threshold from context window for large (1M) models#3571akhil29 wants to merge 4 commits into
akhil29 wants to merge 4 commits into
Conversation
|
Akhil Appana seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
Add tests covering the compaction trigger for Vertex AI Claude Opus across a large (1M) and a smaller (200K) context window. Each test derives the threshold from the model's context window via the production `compaction_threshold(...)` path (default config, no overrides), then asserts the `should_compact` token gate around the resolved threshold: just below → no compaction, at and above → compaction. This exercises the model -> derived-threshold -> trigger path faithfully for the vertex_ai_anthropic provider.
…odels Fix early-compaction on large-context models (e.g. Claude Opus 4.6/4.7/4.8, 1M-token windows) and add extensive bug-catching tests for the Vertex AI Anthropic provider. Two related fixes: 1. agent.rs `compaction_threshold()`: the default token_threshold (100K) was applied as `min(100K, 70% * context_window)`, which capped a 1M-window model to ~10% of its window. Now when `token_threshold` is unset (the realistic default) the threshold is derived purely from the context window (70%); when explicitly configured it is still treated as an absolute cap. 2. response.rs `get_context_length()`: the generic `claude-opus-4-` prefix wrongly captured the 1M-token `claude-opus-4-6/4-7/4-8` models at 200K. Add an explicit 1M branch before the generic 200K branch. Older `claude-opus-4`/`4-1` remain at 200K. Tests: extensive Vertex AI Opus compaction-threshold coverage across large (1M) and small (200K) windows -- derived-threshold assertions, below/at/above trigger boundaries, and the cross-window guarantee that a 200K context does not compact on a 1M window. These genuinely catch the bug (red without the fix). Updated the stale threshold test to reflect the corrected semantics and added a get_context_length test for the Opus 1M models.
d84fbce to
1114b32
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Fixes early-compaction on large-context models (e.g. Claude Opus 4.6/4.7/4.8 with 1M-token windows) and adds extensive bug-catching tests for the Vertex AI Anthropic provider.
The bug
On a 1M-token Opus model with default config, compaction fired at ~100K tokens — roughly 10% of the window — because the default
token_threshold(100K) was applied as an absolute cap:The fix (two parts)
forge_domain/src/agent.rs—compaction_threshold()token_thresholdis unset (the realistic default), derive the threshold purely from the context window (70%), so large windows aren't capped to 100K.min(configured, 70% * window).forge_app/src/dto/anthropic/response.rs—get_context_length()claude-opus-4-prefix wrongly captured the 1M-tokenclaude-opus-4-6/4-7/4-8models at 200K. Added an explicit 1M branch before the generic 200K branch. Olderclaude-opus-4/claude-opus-4-1remain at 200K.Tests (genuine bug-catchers — red without the fix)
Extensive Vertex AI Opus compaction-threshold coverage across large (1M) and small (200K) windows:
threshold - 1does not trigger,thresholddoes)get_context_lengthcoverage for the Opus 1M modelsVerified these fail on the unfixed logic and pass with the fix applied.
Verification
forge_domainvertex_opus: 7 passedforge_domaincompaction_threshold: 6 passedforge_appget_context_length: 8 passedforge_domainandforge_appsuites: green