-
Notifications
You must be signed in to change notification settings - Fork 0
docs: local observability stack sample, guide, and skill #552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devin/1780157741-speech-observability
Are you sure you want to change the base?
Changes from all commits
1fde6cf
5ccf96a
458f023
9ea4f9a
75346be
732b425
0b6a9b0
d7b59e0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,87 @@ | ||
| --- | ||
| name: observability-stack | ||
| description: >- | ||
| Spin up StreamKit's local observability stack (skit + Prometheus + Grafana, | ||
| optional speech gateway) and validate the Grafana dashboards end-to-end. Use | ||
| when testing metrics/dashboards, debugging empty dashboard panels, or | ||
| reproducing the speech-gateway monitoring setup locally. | ||
| license: MPL-2.0 | ||
| --- | ||
|
|
||
| # Observability stack (local) | ||
|
|
||
| `samples/observability/` is a `docker compose` stack that runs skit + Prometheus | ||
| + Grafana (and an optional speech gateway), auto-provisioning both bundled | ||
| dashboards. Use it to validate metrics and dashboards without any cloud setup. | ||
|
|
||
| ## Run it | ||
|
|
||
| ```bash | ||
| cd samples/observability | ||
| docker compose up -d | ||
| ./generate-traffic.sh # direct-to-skit TTS+STT | ||
| # optional gateway row: | ||
| docker compose --profile gateway up -d --build | ||
| ./generate-traffic.sh --gateway | ||
| ``` | ||
|
|
||
| Grafana: <http://localhost:3000> (anonymous admin). Prometheus: | ||
| <http://localhost:9090>. skit: <http://localhost:4545>. | ||
|
|
||
| ## How metrics flow | ||
|
|
||
| - **skit → Prometheus via OTLP push.** Prometheus runs with | ||
| `--web.enable-otlp-receiver`; skit's `SK_TELEMETRY__OTLP_ENDPOINT` points at | ||
| `…/api/v1/otlp/v1/metrics`. There is **no scrape job** for skit. | ||
| - **gateway → Prometheus via scrape** of the gateway's `/metrics`. | ||
|
|
||
| ## Validate dashboards (don't just eyeball) | ||
|
|
||
| OTLP renames dotted metrics and appends unit suffixes, so verify the metric | ||
| names/labels the panels query actually exist before trusting a panel: | ||
|
|
||
| ```bash | ||
| # list all metric names Prometheus knows about | ||
| curl -s localhost:9090/api/v1/label/__name__/values | jq -r '.data[]' | sort | ||
| # run a panel's exact PromQL and count series (0 == panel will be "No data") | ||
| curl -s --data-urlencode 'query=<promql>' localhost:9090/api/v1/query \ | ||
| | jq '.data.result | length' | ||
| # inspect a metric's labels | ||
| curl -s 'localhost:9090/api/v1/series?match[]=<metric>' | jq | ||
| ``` | ||
|
|
||
| Key name/label facts: | ||
|
|
||
| - Plugin metrics: `plugin_call_duration_seconds_*` (unit suffix present), | ||
| `plugin_calls_total`; labels `plugin_kind`, `op`. | ||
| - `oneshot_pipeline_duration_*` has **no** `_seconds` suffix (no unit set); | ||
| labels `status`, and `service` only when an `X-StreamKit-Service` header is | ||
| forwarded by a service-label-aware skit. | ||
| - Gateway: `gateway_requests_total{endpoint,code}`, | ||
| `gateway_request_duration_seconds`, `gateway_rejected_total{reason}` (only | ||
| appears after a 413/415/502 actually occurs). | ||
|
|
||
| ## Expected "No data" (not bugs) | ||
|
|
||
| - Plugin failure panels (`plugin_errors_total` etc.) — counters don't exist | ||
| until a failure happens. | ||
| - Oneshot "by Service" panels — empty unless the skit build emits the `service` | ||
| label. | ||
| - Video / MoQ / codec panels — only populate when you run those pipelines. | ||
|
|
||
| ## Gotchas (most-common causes of empty dashboards) | ||
|
|
||
| - **`latest-demo` is stale.** Pin a versioned `-demo` tag; `latest-demo` can | ||
| predate metrics like `plugin.call.duration`, leaving the Plugins row empty. | ||
| - **Demo-image plugin layout.** `-demo` images ship bare `.so` files but the | ||
| loader wants `plugins/native/<id>/` bundles; `skit/entrypoint.sh` reassembles | ||
| them. Symptom: "no plugins found" / "node kind not found in registry". | ||
| - **Model-name mismatch.** A pipeline's `model_path` must exist in the image's | ||
| `models/`. The stack's `pipelines/` use the names the `-demo` image ships. | ||
| - **Grafana datasource input.** Committed dashboards use `${DS_PROMETHEUS}`; | ||
| the `dashboard-prep` step rewrites it to the provisioned uid. In compose | ||
| command strings, escape it as `$${DS_PROMETHEUS}` so compose doesn't | ||
| interpolate it. | ||
| - **Local auth.** skit needs `SK_AUTH__MODE=disabled` + | ||
| `SK_PERMISSIONS__ALLOW_INSECURE_NO_AUTH=true` to start unauthenticated on a | ||
| non-loopback bind. Local only. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| ../../.agents/skills/observability-stack |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,13 @@ | ||
| # SPDX-FileCopyrightText: © 2025 StreamKit Contributors | ||
| # | ||
| # SPDX-License-Identifier: MPL-2.0 | ||
|
|
||
| FROM golang:1.24-bookworm AS build | ||
| WORKDIR /src | ||
| COPY . . | ||
| RUN CGO_ENABLED=0 go build -o /gateway ./cmd/gateway | ||
|
|
||
| FROM gcr.io/distroless/static-debian12 | ||
| COPY --from=build /gateway /gateway | ||
| EXPOSE 8080 | ||
| ENTRYPOINT ["/gateway"] |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,100 @@ | ||
| <!-- | ||
| SPDX-FileCopyrightText: © 2025 StreamKit Contributors | ||
|
|
||
| SPDX-License-Identifier: MPL-2.0 | ||
| --> | ||
|
|
||
| # Local observability stack | ||
|
|
||
| A `docker compose` stack that runs **skit + Prometheus + Grafana** (and an | ||
| optional **speech gateway**) so you can see StreamKit's metrics on the bundled | ||
| Grafana dashboards locally — no cloud, no manual import. | ||
|
|
||
| ## Quick start | ||
|
|
||
| ```bash | ||
| cd samples/observability | ||
| docker compose up -d # skit + Prometheus + Grafana | ||
| ./generate-traffic.sh # drive ~20 TTS + STT requests through skit | ||
| ``` | ||
|
|
||
| Then open Grafana at <http://localhost:3000> (anonymous admin, no login). Two | ||
| dashboards are auto-provisioned: | ||
|
|
||
| - **StreamKit Performance Dashboard** — the repo's main dashboard | ||
| ([`samples/grafana-dashboard.json`](../grafana-dashboard.json)), including the | ||
| **Plugins / ML inference** row. | ||
| - **StreamKit Speech Gateway Dashboard** — the gateway/oneshot dashboard | ||
| ([`examples/speech-gateway/grafana-dashboard.json`](../../examples/speech-gateway/grafana-dashboard.json)). | ||
|
|
||
| | Service | URL | | ||
| | ---------- | ----------------------- | | ||
| | Grafana | <http://localhost:3000> | | ||
| | Prometheus | <http://localhost:9090> | | ||
| | skit API | <http://localhost:4545> | | ||
| | gateway | <http://localhost:8080> (gateway profile only) | | ||
|
|
||
| ## How metrics get to Prometheus | ||
|
|
||
| Two different paths, both visible on the dashboards: | ||
|
|
||
| - **skit → Prometheus (OTLP push).** skit exports OTLP metrics to Prometheus' | ||
| native OTLP receiver, which is enabled with `--web.enable-otlp-receiver`. | ||
| Configured via `SK_TELEMETRY__OTLP_ENDPOINT` pointing at | ||
| `http://prometheus:9090/api/v1/otlp/v1/metrics`. This feeds the HTTP, engine, | ||
| oneshot, and **plugin** metrics. | ||
| - **gateway → Prometheus (scrape).** The speech gateway exposes a classic | ||
| `/metrics` endpoint that Prometheus scrapes (see `prometheus.yml`). This feeds | ||
| the **Speech Gateway** row. | ||
|
|
||
| ## Speech Gateway row | ||
|
|
||
| The gateway is behind a compose profile because it requires the gateway | ||
| **metrics** instrumentation: | ||
|
|
||
| ```bash | ||
| docker compose --profile gateway up -d --build | ||
| ./generate-traffic.sh --gateway # route traffic through the gateway | ||
| ``` | ||
|
|
||
| Notes: | ||
|
|
||
| - The gateway's `/metrics` endpoint and the `gateway_*` metrics require the | ||
| metrics-instrumented gateway. The Speech Gateway dashboard row stays empty | ||
| until those metrics are present and the gateway has served traffic. | ||
| - The gateway's default STT pipeline targets a Whisper model that must exist on | ||
| the skit it talks to. The bundled `-demo` image ships `ggml-tiny-q5_1.bin`; if | ||
| the gateway points at a different model, STT through the gateway will fail | ||
| while TTS still works. The direct-to-skit traffic path (the default | ||
| `generate-traffic.sh`) avoids this by shipping its own pipelines under | ||
| `pipelines/`. | ||
|
|
||
| ## Known gotchas | ||
|
|
||
| These are the sharp edges worth knowing when wiring this up yourself: | ||
|
|
||
| - **Pin a versioned `-demo` tag.** `latest-demo` can lag behind released | ||
| versions and predate metrics like `plugin.call.duration`, which leaves the | ||
| Plugins / ML inference row empty. This stack pins `v0.5.0-demo`. | ||
| - **Demo image plugin layout.** Current `-demo` images ship native plugins as | ||
| bare `.so` files under `plugins/native/`, but the loader expects directory | ||
| bundles (`plugins/native/<id>/` with a `plugin.yml` + the `.so`). `skit serve` | ||
| otherwise logs "no plugins found" and pipelines fail with "node kind not | ||
| found". `skit/entrypoint.sh` reassembles the expected layout at startup from | ||
| the in-repo manifests (mounted at `/repo-manifests`). | ||
| - **Model names must match.** Pipelines reference model files by path; the file | ||
| must actually be present in the image/`models/` dir. The pipelines under | ||
| `pipelines/` use the model names the `-demo` image actually ships. | ||
| - **Local auth override.** skit refuses to start unauthenticated on a | ||
| non-loopback bind unless you opt in. This stack sets | ||
| `SK_AUTH__MODE=disabled` + `SK_PERMISSIONS__ALLOW_INSECURE_NO_AUTH=true`. | ||
| **Local testing only** — never do this on an exposed instance. | ||
| - **Grafana dashboard datasource.** The committed dashboards use a | ||
| `${DS_PROMETHEUS}` datasource input. The `dashboard-prep` step rewrites it to | ||
| the provisioned datasource uid so the dashboards load without a manual import. | ||
|
|
||
| ## Cleanup | ||
|
|
||
| ```bash | ||
| docker compose --profile gateway down -v | ||
| ``` |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,96 @@ | ||
| # Local observability stack for StreamKit: skit + Prometheus + Grafana, with an | ||
| # optional speech gateway. See README.md for the walkthrough and known gotchas. | ||
| # | ||
| # Usage: | ||
| # docker compose up -d # skit + Prometheus + Grafana | ||
| # docker compose --profile gateway up -d # also build & run the speech gateway | ||
| # | ||
| # Grafana: http://localhost:3000 (anonymous admin, no login) | ||
| # Prometheus: http://localhost:9090 | ||
| # skit API: http://localhost:4545 | ||
| # gateway: http://localhost:8080 (gateway profile only) | ||
|
|
||
| services: | ||
| skit: | ||
| image: ghcr.io/streamer45/streamkit:v0.5.0-demo | ||
| # Pinned to a versioned -demo tag on purpose: `latest-demo` can lag behind | ||
| # and predate metrics like plugin.call.duration, leaving dashboard rows empty. | ||
| entrypoint: ["/entrypoint.sh"] | ||
| environment: | ||
| SK_AUTH__MODE: disabled | ||
| SK_PERMISSIONS__ALLOW_INSECURE_NO_AUTH: "true" | ||
| SK_PLUGINS__DIRECTORY: /opt/streamkit/np | ||
| SK_TELEMETRY__ENABLE: "true" | ||
| SK_TELEMETRY__OTLP_ENDPOINT: http://prometheus:9090/api/v1/otlp/v1/metrics | ||
| volumes: | ||
| - ./skit/entrypoint.sh:/entrypoint.sh:ro | ||
| - ../../plugins/native:/repo-manifests:ro | ||
| ports: | ||
| - "4545:4545" | ||
| healthcheck: | ||
| test: ["CMD", "curl", "-fsS", "http://localhost:4545/healthz"] | ||
| interval: 5s | ||
| timeout: 3s | ||
| retries: 20 | ||
|
|
||
| prometheus: | ||
| image: prom/prometheus:v3.1.0 | ||
| command: | ||
| - --config.file=/etc/prometheus/prometheus.yml | ||
| - --web.enable-otlp-receiver | ||
| - --storage.tsdb.path=/prometheus | ||
| volumes: | ||
| - ./prometheus.yml:/etc/prometheus/prometheus.yml:ro | ||
| ports: | ||
| - "9090:9090" | ||
|
|
||
| dashboard-prep: | ||
| image: alpine:3.21 | ||
| # Copies the in-repo dashboards into Grafana's provisioning dir, resolving | ||
| # the ${DS_PROMETHEUS} template input to the provisioned datasource uid so | ||
| # the dashboards load without manual import. | ||
| command: | ||
| - sh | ||
| - -c | ||
| - | | ||
| set -e | ||
| for f in /in/*.json; do | ||
| sed 's/$${DS_PROMETHEUS}/prometheus/g' "$$f" > "/out/$$(basename "$$f")" | ||
| done | ||
|
Comment on lines
+55
to
+59
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 📝 Info: Compose interpolation escaping in dashboard-prep is intentional The Was this helpful? React with 👍 or 👎 to provide feedback. Debug |
||
| echo "prepared dashboards:"; ls -1 /out | ||
| volumes: | ||
| - ../../samples/grafana-dashboard.json:/in/streamkit.json:ro | ||
| - ../../examples/speech-gateway/grafana-dashboard.json:/in/speech-gateway.json:ro | ||
| - grafana-dashboards:/out | ||
|
|
||
| grafana: | ||
| image: grafana/grafana:11.4.0 | ||
| environment: | ||
| GF_AUTH_ANONYMOUS_ENABLED: "true" | ||
| GF_AUTH_ANONYMOUS_ORG_ROLE: Admin | ||
| GF_AUTH_DISABLE_LOGIN_FORM: "true" | ||
| GF_SECURITY_ADMIN_PASSWORD: admin | ||
| volumes: | ||
| - ./grafana/provisioning:/etc/grafana/provisioning:ro | ||
| - grafana-dashboards:/var/lib/grafana/dashboards:ro | ||
| ports: | ||
| - "3000:3000" | ||
| depends_on: | ||
| - prometheus | ||
| - dashboard-prep | ||
|
|
||
| gateway: | ||
| profiles: ["gateway"] | ||
| build: | ||
| context: ../../examples/speech-gateway | ||
| environment: | ||
| GATEWAY_LISTEN: ":8080" | ||
| SKIT_URL: http://skit:4545 | ||
| ports: | ||
| - "8080:8080" | ||
| depends_on: | ||
| skit: | ||
| condition: service_healthy | ||
|
|
||
| volumes: | ||
| grafana-dashboards: | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
📝 Info: License-header requirements are satisfied via REUSE annotations for frontmatter/config files
The new
SKILL.mdstarts with YAML frontmatter instead of inline SPDX comments, but this is intentional and covered byREUSE.toml's.agents/skills/**/SKILL.mdannotation. Likewise, the new.ymlfiles are covered by the existing**/*.ymlconfiguration-file annotation. I did not flag the missing inline SPDX headers on those files because adding them would either duplicate the configured REUSE coverage or break skill frontmatter parsing.Was this helpful? React with 👍 or 👎 to provide feedback.
Debug
Playground