streamer45 · staging-devin-ai-integration · May 30, 2026 · May 30, 2026 · May 30, 2026 · May 31, 2026
diff --git a/.agents/skills/observability-stack/SKILL.md b/.agents/skills/observability-stack/SKILL.md
@@ -0,0 +1,87 @@
+---
+name: observability-stack
+description: >-
+  Spin up StreamKit's local observability stack (skit + Prometheus + Grafana,
+  optional speech gateway) and validate the Grafana dashboards end-to-end. Use
+  when testing metrics/dashboards, debugging empty dashboard panels, or
+  reproducing the speech-gateway monitoring setup locally.
+license: MPL-2.0
+---
+
+# Observability stack (local)
+
+`samples/observability/` is a `docker compose` stack that runs skit + Prometheus
++ Grafana (and an optional speech gateway), auto-provisioning both bundled
+dashboards. Use it to validate metrics and dashboards without any cloud setup.
+
+## Run it
+
+```bash
+cd samples/observability
+docker compose up -d
+./generate-traffic.sh                 # direct-to-skit TTS+STT
+# optional gateway row:
+docker compose --profile gateway up -d --build
+./generate-traffic.sh --gateway
+```
+
+Grafana: <http://localhost:3000> (anonymous admin). Prometheus:
+<http://localhost:9090>. skit: <http://localhost:4545>.
+
+## How metrics flow
+
+- **skit → Prometheus via OTLP push.** Prometheus runs with
+  `--web.enable-otlp-receiver`; skit's `SK_TELEMETRY__OTLP_ENDPOINT` points at
+  `…/api/v1/otlp/v1/metrics`. There is **no scrape job** for skit.
+- **gateway → Prometheus via scrape** of the gateway's `/metrics`.
+
+## Validate dashboards (don't just eyeball)
+
+OTLP renames dotted metrics and appends unit suffixes, so verify the metric
+names/labels the panels query actually exist before trusting a panel:
+
+```bash
+# list all metric names Prometheus knows about
+curl -s localhost:9090/api/v1/label/__name__/values | jq -r '.data[]' | sort
+# run a panel's exact PromQL and count series (0 == panel will be "No data")
+curl -s --data-urlencode 'query=<promql>' localhost:9090/api/v1/query \
+  | jq '.data.result | length'
+# inspect a metric's labels
+curl -s 'localhost:9090/api/v1/series?match[]=<metric>' | jq
+```
+
+Key name/label facts:
+
+- Plugin metrics: `plugin_call_duration_seconds_*` (unit suffix present),
+  `plugin_calls_total`; labels `plugin_kind`, `op`.
+- `oneshot_pipeline_duration_*` has **no** `_seconds` suffix (no unit set);
+  labels `status`, and `service` only when an `X-StreamKit-Service` header is
+  forwarded by a service-label-aware skit.
+- Gateway: `gateway_requests_total{endpoint,code}`,
+  `gateway_request_duration_seconds`, `gateway_rejected_total{reason}` (only
+  appears after a 413/415/502 actually occurs).
+
+## Expected "No data" (not bugs)
+
+- Plugin failure panels (`plugin_errors_total` etc.) — counters don't exist
+  until a failure happens.
+- Oneshot "by Service" panels — empty unless the skit build emits the `service`
+  label.
+- Video / MoQ / codec panels — only populate when you run those pipelines.
+
+## Gotchas (most-common causes of empty dashboards)
+
+- **`latest-demo` is stale.** Pin a versioned `-demo` tag; `latest-demo` can
+  predate metrics like `plugin.call.duration`, leaving the Plugins row empty.
+- **Demo-image plugin layout.** `-demo` images ship bare `.so` files but the
+  loader wants `plugins/native/<id>/` bundles; `skit/entrypoint.sh` reassembles
+  them. Symptom: "no plugins found" / "node kind not found in registry".
+- **Model-name mismatch.** A pipeline's `model_path` must exist in the image's
+  `models/`. The stack's `pipelines/` use the names the `-demo` image ships.
+- **Grafana datasource input.** Committed dashboards use `${DS_PROMETHEUS}`;
+  the `dashboard-prep` step rewrites it to the provisioned uid. In compose
+  command strings, escape it as `$${DS_PROMETHEUS}` so compose doesn't
+  interpolate it.
+- **Local auth.** skit needs `SK_AUTH__MODE=disabled` +
+  `SK_PERMISSIONS__ALLOW_INSECURE_NO_AUTH=true` to start unauthenticated on a
+  non-loopback bind. Local only.
diff --git a/.claude/skills/observability-stack b/.claude/skills/observability-stack
@@ -0,0 +1 @@
+../../.agents/skills/observability-stack
diff --git a/docs/src/content/docs/guides/observability.md b/docs/src/content/docs/guides/observability.md
@@ -66,6 +66,44 @@ Import [`samples/grafana-dashboard.json`](https://github.com/streamer45/streamki
 
 ![Grafana Dashboard](/screenshots/grafana_dashboard.png)
 
+### What's measured
+
+Beyond HTTP and engine/node throughput, a few metric families are especially
+useful for speech and ML workloads:
+
+- **Plugin / ML inference** — native plugins emit per-call metrics labelled by
+  `plugin_kind` (e.g. `whisper`, `kokoro`) and `op`: `plugin_call_duration_seconds`
+  (histogram), `plugin_calls_total`, and `plugin_errors_total` /
+  `plugin_timeouts_total` / `plugin_panics_total`. This is where inference
+  latency and failures show up — usually the dominant cost of a speech pipeline.
+- **Oneshot pipelines** — `oneshot_pipeline_duration` (histogram) is labelled by
+  `status` (`ok`/`error`). Because every oneshot request hits the same
+  `POST /api/v1/process` endpoint, splitting TTS vs STT requires a trusted
+  `service` label (sent via the `X-StreamKit-Service` header); without it all
+  oneshot traffic collapses into one series.
+- **Speech gateway** — the [speech gateway example](https://github.com/streamer45/streamkit/tree/main/examples/speech-gateway)
+  exposes Prometheus metrics for the front door it puts in front of skit:
+  per-endpoint request rate/latency (`gateway_requests_total`,
+  `gateway_request_duration_seconds`), in-flight gauge, upstream latency, and
+  rejections by reason (`gateway_rejected_total`).
+
+### Run the full stack locally
+
+To see all of the above on the dashboards without any cloud setup, use the
+[`samples/observability`](https://github.com/streamer45/streamkit/tree/main/samples/observability)
+compose stack — it wires skit (OTLP push) + the gateway (scrape) into Prometheus
+and auto-provisions both dashboards in Grafana:
+
+```bash
+cd samples/observability
+docker compose up -d
+./generate-traffic.sh
+# Grafana: http://localhost:3000
+```
+
+See its README for the wiring details and known gotchas (demo-image tag/plugin
+layout, model-name matching, the Prometheus OTLP receiver, and local auth).
+
 ## Traces (OTLP)
 
 Tracing export is controlled by:

diff --git a/examples/speech-gateway/Dockerfile b/examples/speech-gateway/Dockerfile
@@ -0,0 +1,13 @@
+# SPDX-FileCopyrightText: © 2025 StreamKit Contributors
+#
+# SPDX-License-Identifier: MPL-2.0
+
+FROM golang:1.24-bookworm AS build
+WORKDIR /src
+COPY . .
+RUN CGO_ENABLED=0 go build -o /gateway ./cmd/gateway
+
+FROM gcr.io/distroless/static-debian12
+COPY --from=build /gateway /gateway
+EXPOSE 8080
+ENTRYPOINT ["/gateway"]
diff --git a/examples/speech-gateway/README.md b/examples/speech-gateway/README.md
@@ -88,3 +88,5 @@ curl http://127.0.0.1:8080/metrics
 ### Grafana dashboard
 
 A ready-made dashboard lives at [`grafana-dashboard.json`](./grafana-dashboard.json). It is self-contained: import it and pick the Prometheus datasource scraping both the gateway and the StreamKit backend. Alongside the gateway metrics above, it includes a per-service split of the backend's `oneshot_pipeline_duration` (via the `service` label: `tts`/`stt`/`other`) and the StreamKit native-plugin inference metrics (`plugin_call_duration_seconds`, `plugin_calls_total`, …) that back the STT/TTS models.
+
+To run the gateway, Prometheus, and Grafana together locally, see [`samples/observability`](../../samples/observability).
diff --git a/samples/observability/README.md b/samples/observability/README.md
@@ -0,0 +1,100 @@
+<!--
+SPDX-FileCopyrightText: © 2025 StreamKit Contributors
+
+SPDX-License-Identifier: MPL-2.0
+-->
+
+# Local observability stack
+
+A `docker compose` stack that runs **skit + Prometheus + Grafana** (and an
+optional **speech gateway**) so you can see StreamKit's metrics on the bundled
+Grafana dashboards locally — no cloud, no manual import.
+
+## Quick start
+
+```bash
+cd samples/observability
+docker compose up -d            # skit + Prometheus + Grafana
+./generate-traffic.sh           # drive ~20 TTS + STT requests through skit
+```
+
+Then open Grafana at <http://localhost:3000> (anonymous admin, no login). Two
+dashboards are auto-provisioned:
+
+- **StreamKit Performance Dashboard** — the repo's main dashboard
+  ([`samples/grafana-dashboard.json`](../grafana-dashboard.json)), including the
+  **Plugins / ML inference** row.
+- **StreamKit Speech Gateway Dashboard** — the gateway/oneshot dashboard
+  ([`examples/speech-gateway/grafana-dashboard.json`](../../examples/speech-gateway/grafana-dashboard.json)).
+
+| Service    | URL                     |
+| ---------- | ----------------------- |
+| Grafana    | <http://localhost:3000> |
+| Prometheus | <http://localhost:9090> |
+| skit API   | <http://localhost:4545> |
+| gateway    | <http://localhost:8080> (gateway profile only) |
+
+## How metrics get to Prometheus
+
+Two different paths, both visible on the dashboards:
+
+- **skit → Prometheus (OTLP push).** skit exports OTLP metrics to Prometheus'
+  native OTLP receiver, which is enabled with `--web.enable-otlp-receiver`.
+  Configured via `SK_TELEMETRY__OTLP_ENDPOINT` pointing at
+  `http://prometheus:9090/api/v1/otlp/v1/metrics`. This feeds the HTTP, engine,
+  oneshot, and **plugin** metrics.
+- **gateway → Prometheus (scrape).** The speech gateway exposes a classic
+  `/metrics` endpoint that Prometheus scrapes (see `prometheus.yml`). This feeds
+  the **Speech Gateway** row.
+
+## Speech Gateway row
+
+The gateway is behind a compose profile because it requires the gateway
+**metrics** instrumentation:
+
+```bash
+docker compose --profile gateway up -d --build
+./generate-traffic.sh --gateway   # route traffic through the gateway
+```
+
+Notes:
+
+- The gateway's `/metrics` endpoint and the `gateway_*` metrics require the
+  metrics-instrumented gateway. The Speech Gateway dashboard row stays empty
+  until those metrics are present and the gateway has served traffic.
+- The gateway's default STT pipeline targets a Whisper model that must exist on
+  the skit it talks to. The bundled `-demo` image ships `ggml-tiny-q5_1.bin`; if
+  the gateway points at a different model, STT through the gateway will fail
+  while TTS still works. The direct-to-skit traffic path (the default
+  `generate-traffic.sh`) avoids this by shipping its own pipelines under
+  `pipelines/`.
+
+## Known gotchas
+
+These are the sharp edges worth knowing when wiring this up yourself:
+
+- **Pin a versioned `-demo` tag.** `latest-demo` can lag behind released
+  versions and predate metrics like `plugin.call.duration`, which leaves the
+  Plugins / ML inference row empty. This stack pins `v0.5.0-demo`.
+- **Demo image plugin layout.** Current `-demo` images ship native plugins as
+  bare `.so` files under `plugins/native/`, but the loader expects directory
+  bundles (`plugins/native/<id>/` with a `plugin.yml` + the `.so`). `skit serve`
+  otherwise logs "no plugins found" and pipelines fail with "node kind not
+  found". `skit/entrypoint.sh` reassembles the expected layout at startup from
+  the in-repo manifests (mounted at `/repo-manifests`).
+- **Model names must match.** Pipelines reference model files by path; the file
+  must actually be present in the image/`models/` dir. The pipelines under
+  `pipelines/` use the model names the `-demo` image actually ships.
+- **Local auth override.** skit refuses to start unauthenticated on a
+  non-loopback bind unless you opt in. This stack sets
+  `SK_AUTH__MODE=disabled` + `SK_PERMISSIONS__ALLOW_INSECURE_NO_AUTH=true`.
+  **Local testing only** — never do this on an exposed instance.
+- **Grafana dashboard datasource.** The committed dashboards use a
+  `${DS_PROMETHEUS}` datasource input. The `dashboard-prep` step rewrites it to
+  the provisioned datasource uid so the dashboards load without a manual import.
+
+## Cleanup
+
+```bash
+docker compose --profile gateway down -v
+```
diff --git a/samples/observability/docker-compose.yml b/samples/observability/docker-compose.yml
@@ -0,0 +1,96 @@
+# Local observability stack for StreamKit: skit + Prometheus + Grafana, with an
+# optional speech gateway. See README.md for the walkthrough and known gotchas.
+#
+# Usage:
+#   docker compose up -d                  # skit + Prometheus + Grafana
+#   docker compose --profile gateway up -d # also build & run the speech gateway
+#
+# Grafana:    http://localhost:3000  (anonymous admin, no login)
+# Prometheus: http://localhost:9090
+# skit API:   http://localhost:4545
+# gateway:    http://localhost:8080  (gateway profile only)
+
+services:
+  skit:
+    image: ghcr.io/streamer45/streamkit:v0.5.0-demo
+    # Pinned to a versioned -demo tag on purpose: `latest-demo` can lag behind
+    # and predate metrics like plugin.call.duration, leaving dashboard rows empty.
+    entrypoint: ["/entrypoint.sh"]
+    environment:
+      SK_AUTH__MODE: disabled
+      SK_PERMISSIONS__ALLOW_INSECURE_NO_AUTH: "true"
+      SK_PLUGINS__DIRECTORY: /opt/streamkit/np
+      SK_TELEMETRY__ENABLE: "true"
+      SK_TELEMETRY__OTLP_ENDPOINT: http://prometheus:9090/api/v1/otlp/v1/metrics
+    volumes:
+      - ./skit/entrypoint.sh:/entrypoint.sh:ro
+      - ../../plugins/native:/repo-manifests:ro
+    ports:
+      - "4545:4545"
+    healthcheck:
+      test: ["CMD", "curl", "-fsS", "http://localhost:4545/healthz"]
+      interval: 5s
+      timeout: 3s
+      retries: 20
+
+  prometheus:
+    image: prom/prometheus:v3.1.0
+    command:
+      - --config.file=/etc/prometheus/prometheus.yml
+      - --web.enable-otlp-receiver
+      - --storage.tsdb.path=/prometheus
+    volumes:
+      - ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
+    ports:
+      - "9090:9090"
+
+  dashboard-prep:
+    image: alpine:3.21
+    # Copies the in-repo dashboards into Grafana's provisioning dir, resolving
+    # the ${DS_PROMETHEUS} template input to the provisioned datasource uid so
+    # the dashboards load without manual import.
+    command:
+      - sh
+      - -c
+      - |
+        set -e
+        for f in /in/*.json; do
+          sed 's/$${DS_PROMETHEUS}/prometheus/g' "$$f" > "/out/$$(basename "$$f")"
+        done
+        echo "prepared dashboards:"; ls -1 /out
+    volumes:
+      - ../../samples/grafana-dashboard.json:/in/streamkit.json:ro
+      - ../../examples/speech-gateway/grafana-dashboard.json:/in/speech-gateway.json:ro
+      - grafana-dashboards:/out
+
+  grafana:
+    image: grafana/grafana:11.4.0
+    environment:
+      GF_AUTH_ANONYMOUS_ENABLED: "true"
+      GF_AUTH_ANONYMOUS_ORG_ROLE: Admin
+      GF_AUTH_DISABLE_LOGIN_FORM: "true"
+      GF_SECURITY_ADMIN_PASSWORD: admin
+    volumes:
+      - ./grafana/provisioning:/etc/grafana/provisioning:ro
+      - grafana-dashboards:/var/lib/grafana/dashboards:ro
+    ports:
+      - "3000:3000"
+    depends_on:
+      - prometheus
+      - dashboard-prep
+
+  gateway:
+    profiles: ["gateway"]
+    build:
+      context: ../../examples/speech-gateway
+    environment:
+      GATEWAY_LISTEN: ":8080"
+      SKIT_URL: http://skit:4545
+    ports:
+      - "8080:8080"
+    depends_on:
+      skit:
+        condition: service_healthy
+
+volumes:
+  grafana-dashboards:
Original file line number	Diff line number	Diff line change
Expand Up		@@ -88,3 +88,5 @@ curl http://127.0.0.1:8080/metrics
		### Grafana dashboard

		A ready-made dashboard lives at [`grafana-dashboard.json`](./grafana-dashboard.json). It is self-contained: import it and pick the Prometheus datasource scraping both the gateway and the StreamKit backend. Alongside the gateway metrics above, it includes a per-service split of the backend's `oneshot_pipeline_duration` (via the `service` label: `tts`/`stt`/`other`) and the StreamKit native-plugin inference metrics (`plugin_call_duration_seconds`, `plugin_calls_total`, …) that back the STT/TTS models.

		To run the gateway, Prometheus, and Grafana together locally, see [`samples/observability`](../../samples/observability).