Skip to content

feat: Implement optimization code paths and functionality for initial release#140

Open
andrewklatzke wants to merge 58 commits into
mainfrom
aklatzke/AIC-2263/sdk-dx-improvements
Open

feat: Implement optimization code paths and functionality for initial release#140
andrewklatzke wants to merge 58 commits into
mainfrom
aklatzke/AIC-2263/sdk-dx-improvements

Conversation

@andrewklatzke
Copy link
Copy Markdown
Contributor

@andrewklatzke andrewklatzke commented Apr 17, 2026

Requirements

  • I have added test coverage for new or changed functionality
  • I have followed the repository's pull request submission guidelines
  • I have validated my changes against all supported platform versions

Related issues

This PR encapsulates all previous changes in the chain of optimization PRs that were broken up into smaller pieces. Consolidating here so that we can have a single commit/release of the package. The PRs were independently reviewed and approved.

Describe the solution you've provided

See:

#116
#117
#119
#122
#127
#128
#130
#135
#139


Note

High Risk
Large net-new optimization implementation including LaunchDarkly REST API interactions (fetching configs, persisting results, creating AI Config variations) and complex control flow for judging/validation, increasing risk of behavioral and integration regressions.

Overview
Implements the initial real ldai_optimizer package, replacing the prior ldai_optimization scaffold with a full optimization client that can generate prompt/model variations, score them via judges, run post-pass validation samples, and optionally auto-commit the winning variation back to LaunchDarkly.

Adds a config-driven execution path (optimize_from_config) backed by a new internal REST client (LDApiClient) that fetches optimization configs/model pricing and persists per-iteration run telemetry (status/activity, scores, latencies, token usage) to the LaunchDarkly API, plus support for ground-truth batch optimization.

Renames packaging/build/lint targets to ldai_optimizer / launchdarkly-ai-optimizer, expands README usage docs, and adds PROVENANCE.md instructions for verifying GitHub artifact attestations.

Reviewed by Cursor Bugbot for commit d15c6bc. Bugbot is set up for automated code reviews on this repo. Configure here.

…ype, remove required context_choices argument and default to anon
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 7468372. Configure here.

Comment thread packages/optimization/src/ldai_optimizer/dataclasses.py
Comment thread packages/optimization/src/ldai_optimizer/util.py
@andrewklatzke
Copy link
Copy Markdown
Contributor Author

andrewklatzke commented Apr 22, 2026

New commits added with security review results + cursor feedback on those changes:

8b3c69f
7468372

this includes:

  • Rename of package to ldai_optimizer
  • Changes target to the ldai_optimizer package for publishing (sec review - package already published)
  • Adds <untrusted> sentinels around user-supplied or LLM-generated data
  • Fills out the readme some, adds note about how API keys are used (sec review)
  • Adds a top level comment with the same data as the readme ^^ (sec review)
  • Adds a RedactionFilter which will scrub any keys from the loggers
  • Removes logging of the response in fail state

312161f

Includes:

  • Retry logic for posting results
  • Structural validation for the variation we assemble from the LLM response

3bce893

Includes:

  • Removed unused function that had additionalProperties: True

@andrewklatzke
Copy link
Copy Markdown
Contributor Author

e8c6692

Includes:

  • Adding token limit handling

Comment thread packages/optimization/pyproject.toml Outdated
@@ -1,7 +1,7 @@
[project]
name = "launchdarkly-server-sdk-ai-optimization"
name = "ldai_optimizer"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't block on this but all other launchdarkly python packages have the launchdarkly-* name.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to launchdarkly-ai-optimizer and claimed the package name. I'll clean up the older ones

**Requirements**

- [x] I have added test coverage for new or changed functionality
- [x] I have followed the repository's [pull request submission
guidelines](../blob/main/CONTRIBUTING.md#submitting-pull-requests)
- [x] I have validated my changes against all supported platform
versions

**Describe the solution you've provided**

Implements handling for "inverted" judges.

**Describe alternatives you've considered**

This gets feature parity with our online evals functionality; no
alternatives considered.

**Additional context**

When a metric has `is_inverted` set, it's intended that the evaluation
of the score flips from `>=` to` <=`. This adds a util `_judge_passed`
to handle that logic and implements it throughout. We don't surface the
inverted property in the SDK, so we fetch the judge directly to get this
information.

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Changes core judge pass/fail semantics and adds per-judge REST calls
(`get_ai_config`) during config-driven runs, which could affect
optimization outcomes and introduce new failure/performance modes if the
API is unavailable or slow.
> 
> **Overview**
> Adds first-class support for **inverted judges** (where *lower* scores
are better) by introducing a shared `judge_passed` helper and using it
for pass/fail decisions in `OptimizationClient` and in prompt feedback
generation.
> 
> Extends `OptimizationJudge` with an `is_inverted` flag and, for
`optimize_from_config`, fetches each judge’s `isInverted` value via
`api_client.get_ai_config` when building options. Updates logging to
include the inverted status, and adds targeted tests covering the
helper, mixed inverted/standard evaluation, config building behavior,
and `variation_prompt_feedback` output.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
a8f14de. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
**Requirements**

- [x] I have added test coverage for new or changed functionality
- [x] I have followed the repository's [pull request submission
guidelines](../blob/main/CONTRIBUTING.md#submitting-pull-requests)
- [x] I have validated my changes against all supported platform
versions

**Describe the solution you've provided**

Implements better handling for params on the initial variation (folds
model changes in as overwrites rather than completely replacing) and
ensures custom params are persisted unchanged. Additionally makes sure
that tools cannot be changed by the optimization process.

**Describe alternatives you've considered**

This is the result of a bug report so there weren't really alternatives
considered.

**Additional context**

Initially it was assumed that the optimization process would properly
pull params forward (via the LLM) but this doesn't seem to always be the
case. In the case of custom params, they aren't fed into the LLM calls
since they're user-specified data (not specific to the actual
optimization result). We now just pull these through as-is. In the case
of tools, the model will be able to optimize the prompt to call a
specific tool if multiple are provided, but we don't want to strip any
tool information from the final result as it may be necessary for the
calls to function.

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Moderate risk because it changes how model parameters and `tools` are
carried forward across optimization iterations and what gets
auto-committed, which can affect runtime agent behavior if
merging/restoration logic is wrong.
> 
> **Overview**
> Improves variation-application logic so LLM-generated
`current_parameters` are **merged** into existing parameters instead of
replacing them, preserving user-specified/custom settings (e.g.
`max_tokens`, `response_format`) when the LLM omits them.
> 
> Prevents tool drift by always restoring the original `tools` list (and
logging when the LLM returns a different one) to avoid silently dropping
user tools or leaking internal framework tools.
> 
> Captures `model.custom` from the initial LaunchDarkly variation and
includes it when auto-committing a winning variation; adds focused test
coverage for parameter persistence, tool restoration/warnings, and
`model.custom` propagation.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
572a2aa. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
**Requirements**

- [x] I have added test coverage for new or changed functionality
- [x] I have followed the repository's [pull request submission
guidelines](../blob/main/CONTRIBUTING.md#submitting-pull-requests)
- [x] I have validated my changes against all supported platform
versions

**Describe the solution you've provided**

This is intended to demystify some of the results we're receiving from
the optimization package - namely:
- Total token counts are now accrued and reported with each result so
that we can see if a user crosses the total allowed tokens threshold
- Score results are reported for cost or latency if they're being
optimized against as an item in the `score` result so that it can be
shown on the UI
- Finally, if quality has already met the required threshold the prompt
now contains instructions to optimize only against cost (if cost is
being optimized against)

**Describe alternatives you've considered**

This is in some ways a bug fix since this information wasn't clear to
the user as to what was causing the failure. Technically additional
feature/functionality but likely required to express the required
information to make it actionable for the user.

**Additional context**

Cost and latency are only optimized for/include scores if they trigger
the keywords that would lead to them being optimized. "Base"
implementations without these features being used are unaffected.

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Changes optimization pass/fail logic and persisted result payloads
(new gate scores, baseline handling, token-budget semantics), which
could affect when runs succeed/fail and what the UI/API receives.
> 
> **Overview**
> Improves optimization run reporting by tracking and persisting a
single `accumulated_token_usage` total across agent, judge, and
variation calls, and including it in result PATCH payloads (extending
`generationTokens` to allow `accumulated_total`).
> 
> Refactors latency/cost optimization to use explicit baseline values
(not `history[0]`), caps history growth (`_trim_history`) for both
standard and ground-truth flows, and adds synthetic
`_latency_gate`/`_cost_gate` score entries so gate failures are visible
in results.
> 
> Adjusts run control flow so pass/fail is evaluated before token-limit
checks (including GT batches and validation), and updates variation
prompting to focus purely on cost reduction when quality is already
passing; also relaxes the cost gate tolerance from 20% to 10%
improvement and expands tests accordingly.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
365fa94. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
**Requirements**

- [x] I have added test coverage for new or changed functionality
- [x] I have followed the repository's [pull request submission
guidelines](../blob/main/CONTRIBUTING.md#submitting-pull-requests)
- [x] I have validated my changes against all supported platform
versions

**Describe the solution you've provided**

Implements cost optimization in the same manner as latency optimization.
Searches the acceptance statement for keywords pertaining to token
usage/cost (e.g. costs, pricing, bill) and adds instructions to the
variation generation to try to optimize for costs. Additionally has the
acceptance statement prompt return instructions for the variation
generation (ie, cheaper model, etc).

**Describe alternatives you've considered**

This is a feature addition.

**Additional context**

We'll be adding UI options for both latency and cost with adjustable
thresholds, but these are still valid once those arrive since a mention
of cost/latency means the user is trying to optimize for it.

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Adds new cost-gating logic and changes iteration/batch bookkeeping
(baseline tracking, history trimming, token-limit handling), which can
affect optimization outcomes and persisted result records. Risk is
moderated by extensive new unit tests covering the new gates and edge
cases.
> 
> **Overview**
> Adds **cost optimization support** alongside existing latency
optimization: acceptance statements are scanned for cost keywords, agent
calls get per-turn `estimated_cost_usd` (via model pricing when
available), and a new `_cost_gate` is applied similarly to
`_latency_gate`, with both gates recorded as synthetic judge scores for
visibility.
> 
> Improves optimization loop correctness and observability by explicitly
tracking baselines (duration and cost), trimming `_history` to bounded
windows (standard and GT), counting variation-generation tokens into the
run total, stamping `accumulated_token_usage` into result payloads, and
refining token-limit behavior (treat `0` as unlimited and evaluate
pass/fail before halting on budget). Also tightens model ID prefix
stripping to avoid breaking Bedrock region-style IDs and updates package
metadata naming/description.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
4fc1ecf. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
…ase (#190)

**Requirements**

- [x] I have added test coverage for new or changed functionality
- [x] I have followed the repository's [pull request submission
guidelines](../blob/main/CONTRIBUTING.md#submitting-pull-requests)
- [x] I have validated my changes against all supported platform
versions

**Describe the solution you've provided**

This adds a PROVENANCE.md file and registers it with release-please.

**Describe alternatives you've considered**

No alternatives here; required for security

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Low Risk**
> Low risk: documentation-only addition plus a release configuration
tweak to include `PROVENANCE.md` in version bumps; no runtime code
changes.
> 
> **Overview**
> Adds a new `packages/optimization/PROVENANCE.md` documenting how to
verify published wheel provenance using GitHub artifact attestations.
> 
> Updates `release-please-config.json` so `packages/optimization` treats
`PROVENANCE.md` as an `extra-file`, ensuring the doc’s embedded version
snippet is kept in sync during releases.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
32dc4d0. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants