test(pipelines): add CI validation for official oneshot samples by staging-devin-ai-integration[bot] · Pull Request #542 · streamer45/streamkit

staging-devin-ai-integration · 2026-05-30T14:21:26Z

Summary

The official samples/pipelines/oneshot/*.yml had zero automated coverage — only the hand-written samples/pipelines/test/ fixtures were validated. This adds a manifest-driven harness that runs every oneshot sample against a live skit server and validates the output, wired into both pipeline-validation e2e jobs (SW + GPU).
Coverage is gated so it stays cheap and correct: fast Tier-A samples run in the default job; software plugins / VA-API are skipped via optional_node (so the GPU job's PIPELINE_REQUIRE_NODES=1 doesn't fail on them); NVENC/Vulkan run in the GPU job; S3 samples gate on requires_env; and heavyweight showcase pipelines (realtime pacers, software AV1 — up to ~200s) are behind slow / PIPELINE_INCLUDE_SLOW=1.
Every oneshot YAML must have a manifest entry — a new sample without one fails the harness, so future samples get triaged instead of silently uncovered.
Generalized run_pipeline → run_pipeline_parts for multi-field uploads (dual-input mixers) and text inputs (TTS), added JSON-body validation (transcription / VAD), and made the request timeout configurable (PIPELINE_TIMEOUT_SECS).
Added tests/pipeline-validation/README.md documenting how to maintain the harness (manifest rule, skip controls, when to mark a sample slow).

This is the test-harness half of the S2 oneshot review. No oneshot sample was broken — all 38 validated locally (Tier-A/B + S3/Servo pass; HW skipped without a GPU), so there are no sample fixes or removals here.

Review-feedback follow-ups

De-duplicated the media expectations. The 8 ffprobe-checked fields are now a single MediaExpectations struct, flattened (#[serde(flatten)]) into both Expected and OneshotEntry. This removes the hand-written media_expected() copy (which also dead-copied output_extension), so a new ffprobe field is declared once instead of in three places. A unit test pins the flattened TOML deserialization (toml 0.8 + integer fields).
Marked video_nv_av1_colorbars slow. Its core::pacer paces 900 frames at real time (~30s wall-clock), and NVENC AV1 is already covered cheaply by the test/nv_av1_colorbars fixture (30 frames), so the GPU job no longer pays ~30s/run for redundant coverage. (The sibling Vulkan sample has no pacer, so it stays in the default GPU run.)
Added a nightly workflow (pipeline-validation-nightly.yml, schedule + manual dispatch) that runs the slow samples with PIPELINE_INCLUDE_SLOW=1 on SW codecs.

Two registry/model-packaging issues surfaced during testing (not repo bugs, not in this PR) and are now tracked: parakeet ships flat model files instead of the expected named subdir (#548), and the nllb model is XZ-compressed but labeled .tar.bz2 so the installer picks the wrong decompressor (#549).

Review & Validation

Sanity-check the skip semantics in oneshot-samples.toml: optional_node = true for marketplace plugins + VA-API; requires_node only for HW codecs the GPU job builds (nvcodec/vulkan); slow = true now also on video_nv_av1_colorbars.
Confirm the GPU runner is happy running video_vulkan_video_h264_colorbars at 1280×720 (the remaining non-slow HW sample) — validated here only via requires_node skip (no local GPU).
just test-oneshot-samples http://localhost:4545 against a server with plugins reproduces the full pass; add PIPELINE_INCLUDE_SLOW=1 --test-threads=1 for the showcase samples.

Notes

Slow samples intentionally excluded from PR CI: video_av1_compositor_demo (rav1e SW AV1) alone is ~200s, the four realtime-paced compositor demos are ~30s each, and video_nv_av1_colorbars ~30s. All runnable on demand via PIPELINE_INCLUDE_SLOW=1 and nightly.
tests/ is excluded from the workspace, so this crate isn't covered by just lint/just test; correctness is enforced by the e2e job. Re-ran the harness locally after the refactor (38/38, incl. Tier-B model samples and a slow compositor sample) and cargo clippy clean.

Link to Devin session: https://staging.itsdev.in/sessions/66d442e49aca4390adb6749e19cb5dec
Requested by: @streamer45

Devin Review

Status	Commit
🕐 Outdated	`00b41e8` (HEAD is `3461138`)

Run Devin Review

The official samples/pipelines/oneshot/*.yml had zero automated coverage — only the small samples/pipelines/test/ fixtures were validated. This adds a manifest-driven harness that runs every oneshot sample against a live skit server and validates its output (ffprobe for media, JSON parse for transcription/VAD), wired into both e2e pipeline-validation jobs. - New oneshot test target + oneshot-samples.toml manifest covering all 38 samples; a sample without a manifest entry fails the harness so new ones get triaged. - Generalise run_pipeline into run_pipeline_parts to support multi-field uploads (dual-input mixers) and text inputs (TTS); add JSON output validation; make the HTTP timeout configurable via PIPELINE_TIMEOUT_SECS. - Skip controls: requires_node, optional_node (plugins / VA-API — skip even under PIPELINE_REQUIRE_NODES=1), requires_env (S3 creds), and slow (opt in via PIPELINE_INCLUDE_SLOW=1 for realtime-paced / software-AV1 showcases). - Add a just test-oneshot-samples recipe. All 38 samples validated locally (Tier-A/B/EXT pass; HW skipped without a GPU). No sample was broken. Signed-off-by: streamkit-devin <devin@streamkit.dev>

staging-devin-ai-integration · 2026-05-30T14:21:30Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment, CI, and merge conflict monitoring

staging-devin-ai-integration

Devin Review found 11 potential issues.

Debug

Playground

staging-devin-ai-integration · 2026-05-30T14:27:45Z

+requires_node = "plugin::native::nllb"
+optional_node = true


🟡 Single-node gating lets multi-plugin samples run with missing dependencies

Several manifest entries gate an optional sample on only one node even though the YAML uses additional optional nodes, so a partially installed server can run the test instead of skipping it and then fail during pipeline construction/execution. For example, this entry only requires plugin::native::nllb, but samples/pipelines/oneshot/speech_to_text_translate.yml:29-47 also uses plugin::native::whisper and plugin::native::piper; the same pattern affects vad-filtered-stt (samples/pipelines/oneshot/vad-filtered-stt.yml:29-36), speech_to_text_translate_helsinki (samples/pipelines/oneshot/speech_to_text_translate_helsinki.yml:39-56), and video_dav1d_compositor_demo (samples/pipelines/oneshot/video_dav1d_compositor_demo.yml:103-116).

Prompt for agents

The oneshot sample manifest and harness currently model only a single requires_node per sample, but several samples need multiple optional nodes to be present. Update the manifest schema in tests/pipeline-validation/src/lib.rs and the check in tests/pipeline-validation/tests/oneshot.rs so an entry can require multiple node kinds, then update tests/pipeline-validation/oneshot-samples.toml for multi-plugin/multi-codec samples such as speech_to_text_translate, speech_to_text_translate_helsinki, vad-filtered-stt, and video_dav1d_compositor_demo. The intended behavior is that if any required node is missing, optional samples skip, while non-optional samples still fail under PIPELINE_REQUIRE_NODES=1.

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

Good catch — fixed in a1dd8f0. Added a requires_nodes list to OneshotEntry so a sample can require multiple node kinds; the harness now skips (or fails under PIPELINE_REQUIRE_NODES=1) if any required node is missing. Populated the secondary nodes for speech_to_text_translate (whisper, piper), speech_to_text_translate_helsinki (whisper, piper), vad-filtered-stt (vad), and video_dav1d_compositor_demo (svt_av1::encoder). Verified all 38 samples still pass locally.

staging-devin-ai-integration · 2026-05-30T14:27:46Z

+# ---------------------------------------------------------------------------
+# Tier A — pure software, fast enough for the default CI job.
+# ---------------------------------------------------------------------------


🟡 Section-divider comment violates repository comment rules

AGENTS.md explicitly prohibits section-divider comments and asks contributors to use blank lines or code structure instead. This newly added dashed TOML section header is a direct violation of that mandatory repository rule.

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

Fixed in a1dd8f0 — dropped the dashed divider rules; the tier sections now use a single plain comment.

staging-devin-ai-integration · 2026-05-30T14:27:47Z

+# ---------------------------------------------------------------------------
+# Tier A — heavyweight showcase pipelines (realtime pacers / slow SW AV1).
+# Opt in with PIPELINE_INCLUDE_SLOW=1.
+# ---------------------------------------------------------------------------


🟡 Section-divider comment violates repository comment rules

AGENTS.md explicitly prohibits section-divider comments and asks contributors to use blank lines or code structure instead. This newly added dashed TOML section header is a direct violation of that mandatory repository rule.

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

Fixed in a1dd8f0 — dropped the dashed divider rules here too.

staging-devin-ai-integration · 2026-05-30T14:27:48Z

+# ---------------------------------------------------------------------------
+# Tier B — marketplace plugins + models. optional_node = true so the GPU job
+# (PIPELINE_REQUIRE_NODES=1, no plugins) skips rather than fails. Run locally
+# after installing the plugin/model.
+# ---------------------------------------------------------------------------


🟡 Section-divider comment violates repository comment rules

AGENTS.md explicitly prohibits section-divider comments and asks contributors to use blank lines or code structure instead. This newly added dashed TOML section header is a direct violation of that mandatory repository rule.

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

Fixed in a1dd8f0 — dropped the dashed divider rules here too.

staging-devin-ai-integration · 2026-05-30T14:27:49Z

+# ---------------------------------------------------------------------------
+# Tier C — hardware codecs. nvcodec / vulkan_video are compiled by the
+# pipeline-validation-gpu CI job, so they are NOT optional_node (a missing node
+# there is a real regression). VA-API is compiled by no CI job, so it is
+# optional_node = true.
+# ---------------------------------------------------------------------------


🟡 Section-divider comment violates repository comment rules

AGENTS.md explicitly prohibits section-divider comments and asks contributors to use blank lines or code structure instead. This newly added dashed TOML section header is a direct violation of that mandatory repository rule.

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

Fixed in a1dd8f0 — dropped the dashed divider rules here too.

staging-devin-ai-integration · 2026-05-30T14:27:51Z

+    #[serde(default)]
+    pub output_kind: OutputKind,
+
+    // --- Media expectations (used when `output_kind = media`). ---


🟡 Section-divider comment violates repository comment rules

AGENTS.md explicitly prohibits section-divider comments and asks contributors to use blank lines or code structure instead. This newly added implementation section marker is a direct violation of that mandatory repository rule.

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

Fixed in a1dd8f0 — removed this // --- divider (per AGENTS.md comment guidelines).

staging-devin-ai-integration · 2026-05-30T14:27:52Z

+    #[serde(default)]
+    pub json_contains: Vec<String>,
+
+    // --- Skip controls. ---


🟡 Section-divider comment violates repository comment rules

AGENTS.md explicitly prohibits section-divider comments and asks contributors to use blank lines or code structure instead. This newly added implementation section marker is a direct violation of that mandatory repository rule.

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

Fixed in a1dd8f0 — removed this // --- Skip controls. --- divider.

staging-devin-ai-integration · 2026-05-30T14:27:53Z

+    let first_line = body
+        .lines()
+        .find(|line| !line.trim().is_empty())
+        .ok_or("JSON output is empty")?;
+
+    serde_json::from_str::<serde_json::Value>(first_line)
+        .map_err(|e| format!("Output is not valid JSON: {e}\n  first line: {first_line}"))?;


🚩 JSON validation only checks the first NDJSON record

The new JSON path validates that the first non-empty line parses as JSON and then optionally checks raw substrings. That matches the current NDJSON-style transcription/VAD samples, but it means later malformed records would not be caught if a node emits valid JSON first and corrupt output later. If the intent is to smoke-test JSON framing rather than payload completeness this is acceptable; otherwise consider validating every non-empty line for output_kind = "json".

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

Strengthened in a1dd8f0. validate_json_output now first tries to parse the whole body as a single JSON document; if that fails it treats the body as NDJSON and requires every non-empty line to parse (previously only the first line was checked). Substring checks still scan the full body.

staging-devin-ai-integration · 2026-05-30T14:27:54Z

+      - name: Validate official oneshot samples (SW codecs only)
+        run: |
+          cd tests/pipeline-validation
+          PIPELINE_TEST_URL=http://127.0.0.1:4545 cargo test --test oneshot


📝 Info: Default CI intentionally skips slow showcase samples

The workflow invokes the oneshot harness without PIPELINE_INCLUDE_SLOW, so entries marked slow = true will not run in either the software or GPU CI jobs. This appears intentional from the justfile guidance and manifest comments, but reviewers should be aware that several long-running compositor/pacer samples remain manifest-covered rather than CI-executed by default.

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

Correct, and intentional: the slow showcase samples (realtime pacers, rav1e SW AV1 ~200s) are gated behind PIPELINE_INCLUDE_SLOW=1 to keep CI cheap. They remain manifest-covered and runnable on demand via just test-oneshot-samples. Happy to add a periodic (nightly) job that runs them with the flag set if you'd like that coverage in CI too.

staging-devin-ai-integration · 2026-05-30T14:27:55Z

+pub fn available_nodes() -> &'static HashSet<String> {
+    static NODES: OnceLock<HashSet<String>> = OnceLock::new();
+    NODES.get_or_init(|| {
+        get_available_nodes(base_url()).unwrap_or_else(|err| {
+            if std::env::var("PIPELINE_REQUIRE_NODES").as_deref() == Ok("1") {
+                panic!(
+                    "FATAL: Could not query available nodes: {err}\n  \
+                     PIPELINE_REQUIRE_NODES=1 requires a reachable server at {}",
+                    base_url()
+                );
+            }
+            eprintln!("WARNING: Could not query available nodes: {err}");
+            eprintln!("  Node-gated tests will be skipped.");
+            eprintln!("  Is skit running at {}?", base_url());
+            HashSet::new()
+        })
+    })


📝 Info: Node availability is cached for the process lifetime

Moving available_nodes() into the shared library means both harnesses now use a once-per-process snapshot of /api/v1/schema/nodes. This is fine for the CI pattern where skit is built and started before tests, but a local run against a server that loads/unloads plugins during the same test process will not see those changes. That is a behavioral constraint worth keeping in mind for plugin-focused local validation.

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

Yes — node availability is queried once and cached for the process lifetime via OnceLock. That's fine for these batch test runs: the server's registered node set is static for the duration of a cargo test invocation, and caching avoids re-querying /api/v1/... for every sample. No change needed.

- Support multiple required node kinds per sample via requires_nodes, so multi-plugin/codec samples (translate chains, vad-filtered-stt, dav1d compositor) skip when any dependency is missing instead of running and failing during pipeline construction. - Validate every NDJSON record in JSON outputs, not just the first line. - Drop section-divider comments per AGENTS.md comment guidelines. Signed-off-by: streamkit-devin <devin@streamkit.dev>

…htly slow job Extract the shared ffprobe checks into MediaExpectations, flattened into both Expected and OneshotEntry, so a new media field is declared once instead of copied by hand in media_expected() (which also dead-copied output_extension). Mark video_nv_av1_colorbars slow: its core::pacer paces 900 frames at real time (~30s), and NVENC AV1 is already covered cheaply by the test/nv_av1_colorbars fixture, so the GPU job no longer pays ~30s per run. Add a nightly workflow that runs the slow samples with PIPELINE_INCLUDE_SLOW=1. Signed-off-by: streamkit-devin <devin@streamkit.dev>

Document the two suites, local run recipes, the per-sample manifest rule, skip-control semantics, when to mark a sample slow, and that MediaExpectations is the single source of truth for ffprobe checks. Signed-off-by: streamkit-devin <devin@streamkit.dev>

slow is checked before the node check, so the oneshot suite skips nv_av1 and only guards vulkan_video; NVENC-AV1 registration is guarded by the validate.rs nv_av1_colorbars fixture. Signed-off-by: streamkit-devin <devin@streamkit.dev>

codecov · 2026-05-30T17:48:21Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 79.98%. Comparing base (ca6c476) to head (3461138).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #542      +/-   ##
==========================================
+ Coverage   79.96%   79.98%   +0.01%     
==========================================
  Files         234      234              
  Lines       68061    68061              
  Branches     1933     1933              
==========================================
+ Hits        54426    54438      +12     
+ Misses      13629    13617      -12     
  Partials        6        6

Flag	Coverage Δ
backend	`79.75% <ø> (+0.01%)`	⬆️
ui	`82.24% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
core	`85.37% <ø> (ø)`
engine	`82.94% <ø> (ø)`
api	`89.96% <ø> (ø)`
nodes	`76.89% <ø> (+0.04%)`	⬆️
server	`80.33% <ø> (ø)`
plugin-native	`83.70% <ø> (ø)`
plugin-wasm	`92.20% <ø> (ø)`
ui-services	`84.69% <ø> (ø)`
ui-components	`60.49% <ø> (ø)`
see 1 file with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

staging-devin-ai-integration Bot assigned streamer45 May 30, 2026

staging-devin-ai-integration Bot requested a review from streamer45 May 30, 2026 14:21

staging-devin-ai-integration Bot commented May 30, 2026

View reviewed changes

streamkit-devin added 2 commits May 30, 2026 14:33

This was referenced May 30, 2026

parakeet marketplace package ships model files flat, breaking the official sample's model_dir #548

Open

nllb model archive is XZ but named .tar.bz2; installer picks decompressor by extension and fails #549

Open

streamkit-devin added 2 commits May 30, 2026 17:19

streamer45 merged commit 78f8332 into main May 30, 2026
29 checks passed

streamer45 deleted the devin/1780150771-oneshot-sample-validation branch May 30, 2026 18:03

Conversation

staging-devin-ai-integration Bot commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Review-feedback follow-ups

Review & Validation

Notes

Devin Review

Uh oh!

staging-devin-ai-integration Bot commented May 30, 2026

🤖 Devin AI Engineer

Uh oh!

staging-devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

staging-devin-ai-integration Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

staging-devin-ai-integration Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

staging-devin-ai-integration Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

staging-devin-ai-integration Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

staging-devin-ai-integration Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

staging-devin-ai-integration Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

staging-devin-ai-integration Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

staging-devin-ai-integration Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

staging-devin-ai-integration Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

staging-devin-ai-integration Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

staging-devin-ai-integration Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

staging-devin-ai-integration Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

staging-devin-ai-integration Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

staging-devin-ai-integration Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

staging-devin-ai-integration Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

staging-devin-ai-integration Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

staging-devin-ai-integration Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

staging-devin-ai-integration Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

staging-devin-ai-integration Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

staging-devin-ai-integration Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented May 30, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

staging-devin-ai-integration Bot commented May 30, 2026 •

edited

Loading