feat(speech-gateway): expose Prometheus metrics#546
feat(speech-gateway): expose Prometheus metrics#546staging-devin-ai-integration[bot] wants to merge 3 commits into
Conversation
Instrument the /tts and /stt handlers with request count, latency, in-flight, upstream-call latency, and rejection-reason metrics, and serve them at GET /metrics (ungated by the concurrency semaphore). Signed-off-by: streamkit-devin <devin@streamkit.dev>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
Lets the skit oneshot_pipeline metric split by tts/stt service. Signed-off-by: streamkit-devin <devin@streamkit.dev>
| rec := &statusRecorder{ResponseWriter: w, code: http.StatusOK} | ||
| start := time.Now() | ||
| next(rec, r) | ||
|
|
||
| requestDuration.WithLabelValues(endpoint).Observe(time.Since(start).Seconds()) | ||
| requestsTotal.WithLabelValues(endpoint, r.Method, strconv.Itoa(rec.code)).Inc() |
There was a problem hiding this comment.
🟡 Canceled requests are counted as HTTP 200s
statusRecorder is initialized with http.StatusOK and instrument always increments gateway_requests_total using that value after the handler returns. The speech handlers intentionally return without writing any response when proxyMultipart fails with context.Canceled (examples/speech-gateway/cmd/gateway/main.go:253-255, examples/speech-gateway/cmd/gateway/main.go:324-326), so a client disconnect during the upstream call or response copy is exported as code="200" even though no successful 200 response was served. This makes the new success/error metrics under-report aborted requests as successes.
Prompt for agents
In examples/speech-gateway/cmd/gateway/metrics.go, avoid defaulting unwritten responses to 200 for metrics. The current statusRecorder starts with code=http.StatusOK, so handlers that return without writing (notably handleSTT/handleTTS on context.Canceled from proxyMultipart) are counted as successful 200 responses. Track whether any header/body was written and either use a distinct label for aborted/unwritten requests, skip the status-code counter for them, or have the handlers write an explicit status before returning when appropriate. Update tests to cover a handler that returns without writing.
Was this helpful? React with 👍 or 👎 to provide feedback.
Debug
There was a problem hiding this comment.
Good catch — fixed in 458f023. instrument now records code="499" (nginx's "client closed request") when the handler returns without writing a response, instead of defaulting to 200. This covers the context.Canceled paths in handleSTT/handleTTS, so aborted requests are no longer counted as successes. Added TestInstrumentUnwrittenResponseCountedAsClientClosed to lock in the no-write behavior.
| mux.HandleFunc("/stt", instrument("stt", gw.handleSTT)) | ||
| mux.HandleFunc("/tts", instrument("tts", gw.handleTTS)) | ||
| // /metrics is intentionally not gated by the concurrency semaphore so it | ||
| // stays scrapable while request slots are saturated. | ||
| mux.Handle("/metrics", promhttp.Handler()) |
There was a problem hiding this comment.
📝 Info: Instrumentation wraps only STT/TTS, leaving /metrics outside the semaphore as intended
The new route registration wraps /stt and /tts with instrument(...) but registers /metrics directly with promhttp.Handler(). This matches the documented behavior that metrics remain scrapable even when all gateway request slots are saturated, and because the semaphore is still acquired inside handleSTT/handleTTS, the metrics wrapper does not change the concurrency limit itself (examples/speech-gateway/cmd/gateway/main.go:248-249, examples/speech-gateway/cmd/gateway/main.go:279-280).
Was this helpful? React with 👍 or 👎 to provide feedback.
Debug
| func (s *statusRecorder) Flush() { | ||
| if f, ok := s.ResponseWriter.(http.Flusher); ok { | ||
| f.Flush() | ||
| } |
There was a problem hiding this comment.
📝 Info: Flusher preservation prevents the wrapper from disabling streaming responses
The proxied response path checks whether the response writer implements http.Flusher before wrapping writes in flushWriter (examples/speech-gateway/cmd/gateway/main.go:405-420). Since statusRecorder implements Flush() by delegating to the underlying writer, wrapping STT/TTS handlers for metrics should still allow streamed upstream audio/transcription responses to flush incrementally rather than buffering until handler completion.
Was this helpful? React with 👍 or 👎 to provide feedback.
Debug
Handlers return without writing on context.Canceled; default the metric to 499 (client closed) unless a status was written, so cancellations are not counted as successes. Signed-off-by: streamkit-devin <devin@streamkit.dev>
Brings the gateway /metrics instrumentation into the branch so the sample's optional gateway profile exposes the gateway_* series the Speech Gateway dashboard row queries. Signed-off-by: streamkit-devin <devin@streamkit.dev>
Summary
Part of the speech-services observability effort. This PR instruments the Go speech gateway; sibling PRs add the Grafana dashboard and docs that build against the metric contract below.
github.com/prometheus/client_golangand serveGET /metricsviapromhttp. The route is registered directly on the mux, so it is not gated by theGATEWAY_MAX_CONCURRENCYsemaphore (which is acquired inside the handlers) and stays scrapable when request slots are saturated./ttsand/sttin a smallinstrument(endpoint, …)helper that records count/latency/in-flight per endpoint. AstatusRecordercaptures the status code while still implementinghttp.Flusher, so proxied responses keep streaming incrementally.bad_content_type, 413→too_large, 502→upstream_error), keeping the existing handler logic untouched.context.Canceledclient-disconnect path) are recorded ascode="499"rather than200, so aborted requests aren't counted as successes.gateway_upstream_duration_secondsis observed around theclient.Docall to the skit backend/api/v1/process.X-StreamKit-Service: {tts|stt}on the upstream/api/v1/processrequest so the backend'soneshot_pipeline.durationcan split by a boundedservicelabel (consumed by sibling Rust PR feat(metrics): config-driven bounded request labels (adds service to oneshot_pipeline.duration) #545 — without this it always reportsother)..gitignoreto/gateway— the unanchoredgatewaypattern was also ignoring thecmd/gateway/source dir.Frozen metric contract emitted (all carry
endpoint∈tts|stt):gateway_requests_total{endpoint,method,code}— countergateway_request_duration_seconds{endpoint}— histogram (buckets0.05,0.1,0.25,0.5,1,2.5,5,10,30)gateway_inflight_requests{endpoint}— gaugegateway_upstream_duration_seconds{endpoint}— histogramgateway_rejected_total{endpoint,reason}— counter,reason∈too_large|bad_content_type|upstream_errorReview & Validation
X-StreamKit-Serviceheader name matches what PR feat(metrics): config-driven bounded request labels (adds service to oneshot_pipeline.duration) #545 reads.cd examples/speech-gateway && gofmt -l . && go vet ./... && go build ./... && go test ./.../metricsis reachable while concurrency slots are exhausted (it bypasses the semaphore by design).Notes
gateway_rejected_totalreasons, and/metricsexposes the full contract.golangci-lintwas run with the configured linters exceptunqueryvet(a SQL-builder plugin requiring a custom golangci-lint build; unrelated to this code) — 0 issues. There is no CI lint job wired forexamples/speech-gateway.Link to Devin session: https://staging.itsdev.in/sessions/0aff3af47014452692b73a1e0acb87a8
Requested by: @streamer45
Devin Review
1fde6cf(HEAD is458f023)