Fix none approach streaming passthrough for tool-calling clients#313
Open
joby-brentsmith wants to merge 2 commits into
Open
Conversation
When approach is none and stream=true, forward upstream SSE chunks verbatim instead of synthesizing text-only responses. Preserve original request messages for multi-turn agent tool loops, and merge split tool_calls choices for non-streaming responses. Co-authored-by: Cursor <cursoragent@cursor.com>
normalize_message_content() was flattening list content to text-only strings, dropping image_url parts. Keep list content intact when messages include non-text multimodal parts. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #312
Problem
The
noneapproach is documented as a direct pass-through to the upstream OpenAI-compatible endpoint. In practice, whenstream: trueandtoolsare present, OptiLLM:choices[0].message.content(text)finish_reason: "stop"and notool_callsAgent clients that depend on streamed
tool_callsreceive the assistant's announcement text but never get tool metadata to execute.Solution
Scope limited to
operation == 'SINGLE' and approaches[0] == 'none'. Optimization paths (rto,cot_reflection,moa, etc.) are unchanged.generate_stream_passthrough()— whenstream: true, call upstream withstream=Trueand yield each chunk as SSE without modification.none_approach(original_messages=messages, ...)directly instead of reconstructing fromparse_conversation().promote_tool_calls_to_first_choice()— for non-streaming responses, mergetool_callsfrom a later choice intochoices[0](provider-agnostic; same pattern as goose#6369).Testing
Streaming (should show
tool_callschunks):Non-streaming split choices (should have
tool_callsinchoices[0]):Files changed
optillm/server.py— 2 helpers + rewirenonebranch inproxy()Made with Cursor