Skip to content

Fix none approach streaming passthrough for tool-calling clients#313

Open
joby-brentsmith wants to merge 2 commits into
algorithmicsuperintelligence:mainfrom
joby-brentsmith:fix/none-streaming-tool-calls
Open

Fix none approach streaming passthrough for tool-calling clients#313
joby-brentsmith wants to merge 2 commits into
algorithmicsuperintelligence:mainfrom
joby-brentsmith:fix/none-streaming-tool-calls

Conversation

@joby-brentsmith

Copy link
Copy Markdown

Fixes #312

Problem

The none approach is documented as a direct pass-through to the upstream OpenAI-compatible endpoint. In practice, when stream: true and tools are present, OptiLLM:

  1. Calls upstream non-streaming
  2. Extracts only choices[0].message.content (text)
  3. Synthesizes a fake SSE response with finish_reason: "stop" and no tool_calls

Agent clients that depend on streamed tool_calls receive the assistant's announcement text but never get tool metadata to execute.

Solution

Scope limited to operation == 'SINGLE' and approaches[0] == 'none'. Optimization paths (rto, cot_reflection, moa, etc.) are unchanged.

  1. generate_stream_passthrough() — when stream: true, call upstream with stream=True and yield each chunk as SSE without modification.
  2. Original request messages — call none_approach(original_messages=messages, ...) directly instead of reconstructing from parse_conversation().
  3. promote_tool_calls_to_first_choice() — for non-streaming responses, merge tool_calls from a later choice into choices[0] (provider-agnostic; same pattern as goose#6369).

Testing

Streaming (should show tool_calls chunks):

curl -s -N http://127.0.0.1:8000/v1/chat/completions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Run echo hello with the shell tool"}],
    "tools": [{"type": "function", "function": {
      "name": "shell",
      "parameters": {"type": "object", "properties": {"command": {"type": "string"}}, "required": ["command"]}
    }}],
    "tool_choice": "auto",
    "stream": true
  }' | rg "tool_calls|finish_reason"

Non-streaming split choices (should have tool_calls in choices[0]):

curl -s http://127.0.0.1:8000/v1/chat/completions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Run echo hello with the shell tool"}],
    "tools": [{"type": "function", "function": {
      "name": "shell",
      "parameters": {"type": "object", "properties": {"command": {"type": "string"}}, "required": ["command"]}
    }}],
    "stream": false
  }' | python3 -c "
import sys, json
d = json.load(sys.stdin)
c = d['choices'][0]
assert len(d['choices']) == 1
assert c['message'].get('tool_calls')
print('ok')
"

Files changed

  • optillm/server.py — 2 helpers + rewire none branch in proxy()

Made with Cursor

When approach is none and stream=true, forward upstream SSE chunks
verbatim instead of synthesizing text-only responses. Preserve original
request messages for multi-turn agent tool loops, and merge split
tool_calls choices for non-streaming responses.

Co-authored-by: Cursor <cursoragent@cursor.com>
@CLAassistant

CLAassistant commented Jun 15, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

normalize_message_content() was flattening list content to text-only
strings, dropping image_url parts. Keep list content intact when
messages include non-text multimodal parts.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

none approach drops tool_calls on streaming requests

2 participants