Skip to content

feat(samples): discovery UX with variant grouping and faceted search#544

Open
staging-devin-ai-integration[bot] wants to merge 15 commits into
mainfrom
devin/1780157211-sample-discovery-ux
Open

feat(samples): discovery UX with variant grouping and faceted search#544
staging-devin-ai-integration[bot] wants to merge 15 commits into
mainfrom
devin/1780157211-sample-discovery-ux

Conversation

@staging-devin-ai-integration
Copy link
Copy Markdown
Contributor

@staging-devin-ai-integration staging-devin-ai-integration Bot commented May 30, 2026

Summary

  • Adds a discovery UX to the Convert and Stream pipeline pickers: near-duplicate samples (codec/hardware/language variants) collapse into a single scenario card with a variant selector, plus substring search and faceted filtering (category, capability, GPU requirement).
  • Sample YAML is the source of truth. Each bundled sample authors group/variant/canonical/category/tags/keywords directly; there is no runtime derivation from filenames or node-kind substrings. The card title/description come from the group's canonical member, never a guessed variants[0].
  • The server emits the authored fields plus a resolved, lowercased search_terms document (name + description + category + tags + authored keywords + flattened node kinds). The UI does plain substring matching — the old TS SYNONYM_GROUPS table is gone.
  • The contract is enforced in CI (apps/skit/tests/sample_discovery_metadata_test.rs): bundled samples must carry category + tags; grouped samples must have exactly one canonical member and a variant label on every member; ungrouped samples must not set canonical/variant. Missing or inconsistent metadata fails the build instead of silently degrading the UI.
  • The "Customize" step gains an Open in Design view button that hands the current editor YAML (including unsaved edits) to the visual node-graph editor.

Review & Validation

  • Convert/Stream: variant cards group correctly and the card identity (title/description) comes from the canonical member, not an arbitrary variant.
  • Search behaves over authored metadata (transcribe→STT, colorbars→colorbars) via search_terms; capability facets exclude codec/format/hardware tags.
  • "Open in Design view" imports the pipeline as a graph (built-in nodes; uninstalled plugin nodes error the same way the existing Import YAML does).
  • just lint, just test-ui, cargo test -p streamkit-server green — including the new sample_discovery_metadata_test.

Notes

This replaces the earlier heuristic derivation (filename tokenization, substring category/tag inference, codec sniffing) that #551 had been tracking — the explicit-contract rewrite is now in this PR rather than deferred. Adding a new bundled sample now requires authoring its discovery metadata, which the validation test will demand.

Verified live against the local MoQ setup:

Convert — scenario grouping with codec variant pills and faceted chips:

Convert grouped cards

Open in Design view — Customize YAML handed off to the node graph:

Design handoff

Stream — grouped MoQ colorbars card:

Stream grouped card

Link to Devin session: https://staging.itsdev.in/sessions/da773a0e70084000b42a86c2ed6664d9
Requested by: @streamer45


Devin Review

Status Commit
🕐 Outdated 1e37e36 (HEAD is 9740783)

Run Devin Review

Open in Devin Review (Staging)

Surface the growing sample-pipeline catalog as grouped scenario cards with a
variant selector and faceted/fuzzy search instead of a flat list of
near-duplicate templates.

Discovery metadata (group/variant/category/tags) is optional in sample YAML;
when omitted the server derives best-effort values from node kinds, the client
section, and filename patterns, so existing samples need no edits. Explicit
YAML values win per-field; derived tags union with curated ones.

The TemplateSelector (used by both Convert and Stream views) collapses a
variant family (e.g. the colorbars codec variants) into one card with radio
pills, and adds category/capability/needs-hardware facet chips alongside the
existing origin filter and search.

Signed-off-by: streamkit-devin <devin@streamkit.dev>
@staging-devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment, CI, and merge conflict monitoring

Copy link
Copy Markdown
Contributor Author

@staging-devin-ai-integration staging-devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 4 potential issues.

Open in Devin Review (Staging)
Debug

Playground

Comment thread ui/src/utils/samplePipelineOrdering.ts Outdated
Comment on lines +147 to +148
const base = variants.find((v) => !v.variant) ?? variants[0];
return { key, base, variants };
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Variant-only groups choose the alphabetically first variant as the card representative

When every sample in a scenario group has a variant, groupSamplePipelinesByScenario sorts the variants and then falls back to variants[0] as the group base. The PR adds variant: Software to the previously canonical samples/pipelines/dynamic/video_moq_colorbars.yml:8, so the colorbars family no longer has any no-variant member; the group card title/description rendered from group.base in ui/src/components/converter/TemplateSelector.tsx:89-95 will come from whichever codec/hardware variant sorts first rather than the intended generic/software sample. This makes grouped sample cards display the wrong representative metadata even though the variants are still selectable.

Prompt for agents
Fix representative selection for grouped sample cards in ui/src/utils/samplePipelineOrdering.ts. The current code sorts variants before choosing base and falls back to the first sorted variant when all members have a variant. That is now triggered by samples/pipelines/dynamic/video_moq_colorbars.yml setting variant: Software on the previously canonical sample. Choose the group representative independently from variant sort order, for example by preserving/marking the intended representative before sorting or by adding an explicit group display label/base selection policy. Ensure the colorbars family still shows the generic/software card metadata while retaining the “Software” variant label in the selector.
Open in Devin Review (Staging)

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid catch — this is the same tradeoff I flagged in the PR description. By giving the canonical video_moq_colorbars.yml an explicit variant: Software, the group lost its no-variant member, so base falls through to the alphabetically-first variant for the title/description (the variants themselves are all still selectable and correct).

Two clean ways to fix, both cheap:

  1. Drop variant: Software from the exemplar so the canonical sample stays the natural (no-variant) representative — keeps the override demo via category/tags only.
  2. Pick the representative independently of variant order (e.g. the member whose id has the fewest variant tokens / shortest id), so an explicit base label can coexist with a stable card.

I've left this for @streamer45 to choose since it's a UX-label call rather than a correctness bug. Happy to apply either.

Comment thread ui/src/utils/samplePipelineOrdering.ts Outdated
Comment on lines +93 to +100
export function matchesSamplePipelineQuery(pipeline: SamplePipeline, query: string): boolean {
const normalizedQuery = query.trim().toLowerCase();
if (!normalizedQuery) return true;

const haystack = searchableText(pipeline);
const terms = normalizedQuery.split(/\s+/).filter(Boolean);

return terms.every((term) => expandTerm(term).some((candidate) => haystack.includes(candidate)));
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Info: Search now uses AND semantics across query tokens

The new query matcher splits input on whitespace and requires every term to match some searchable field or synonym. This is an intentional behavioral change from the previous single-substring match, and the tests cover it (ui/src/utils/samplePipelineOrdering.test.ts:160-168). It means queries like video audio will only show pipelines whose combined metadata includes both terms, which is useful for faceting but may surprise users who expected broad OR-style search.

Open in Devin Review (Staging)

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, AND-across-tokens is intentional — it makes multi-word queries (e.g. nvidia av1) narrow rather than balloon, which pairs well with the facet chips. Single-token search (the common case) is unaffected. Tests at samplePipelineOrdering.test.ts:160-168 lock in the behavior.

Comment thread apps/skit/src/sample_discovery.rs Outdated
Comment on lines +262 to +264
Discovery {
group: explicit.group.or(Some(derived_group)),
variant: explicit.variant.or(derived_variant),
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Info: Derived groups are emitted for every sample, not only known variant families

derive always returns group: Some(derived_group) even when the filename has no variant-like token. In the current UI this is mostly harmless because groups of one render as normal cards, and accidental multi-sample collisions are mitigated by the token rules and current bundled filenames. Reviewers should be aware that any future samples with filenames that only differ by tokens in SINGLE_TOKENS or LANGUAGE_TOKENS will automatically collapse into one selector card unless they set explicit discovery fields.

Open in Devin Review (Staging)

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right — every sample gets a derived group, but a group of one renders as a normal flat card (see ScenarioCard for variants.length === 1), so there's no visible difference unless two filenames actually collapse. The compound-token rules (e.g. vulkan_video) guard the known families. If a future pair of filenames differs only by a SINGLE_TOKENS/LANGUAGE_TOKENS token, they'd merge into one selector — which is usually the desired behavior, and any author can override with explicit group/variant. Worth keeping in mind when adding samples.

const [capabilityFilter, setCapabilityFilter] = React.useState<string | null>(null);
const [hardwareOnly, setHardwareOnly] = React.useState(false);

const facets = React.useMemo(() => collectSampleFacets(templates), [templates]);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Info: Facet chips are built from all templates rather than current origin/search scope

collectSampleFacets(templates) computes the visible facet options from the full template list, while filtering applies origin, category, capability, hardware, and query afterward. This keeps facet choices stable as users type or toggle filters, but it also means a facet chip can remain visible even when selecting it with the current origin/search filters will yield an empty result set. That appears to be a UX tradeoff rather than a correctness bug.

Open in Devin Review (Staging)

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional — facet options are computed from the full template set so they don't flicker/reorder as you type or toggle other facets. The "selected hidden by filters" hint + Clear filters covers the empty-result case. Scoping facets to the current filter set is a possible future refinement if it proves confusing.

The extended SamplePipeline type makes group/variant/category/tags required;
update the two inline mocks in the samples/fragments service tests to match.

Signed-off-by: streamkit-devin <devin@streamkit.dev>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 30, 2026

Codecov Report

❌ Patch coverage is 99.40828% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.03%. Comparing base (6abe3ba) to head (9740783).

Files with missing lines Patch % Lines
ui/src/components/converter/TemplateSelector.tsx 97.53% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #544      +/-   ##
==========================================
+ Coverage   79.96%   80.03%   +0.06%     
==========================================
  Files         234      236       +2     
  Lines       68061    68299     +238     
  Branches     1846     1970     +124     
==========================================
+ Hits        54428    54664     +236     
- Misses      13627    13629       +2     
  Partials        6        6              
Flag Coverage Δ
backend 79.78% <100.00%> (+0.04%) ⬆️
ui 82.52% <98.78%> (+0.27%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
core 85.37% <ø> (ø)
engine 83.03% <ø> (ø)
api 90.00% <100.00%> (+0.03%) ⬆️
nodes 76.84% <ø> (ø)
server 80.42% <100.00%> (+0.09%) ⬆️
plugin-native 83.70% <ø> (ø)
plugin-wasm 92.20% <ø> (ø)
ui-services 84.69% <ø> (ø)
ui-components 63.59% <98.26%> (+3.09%) ⬆️
Files with missing lines Coverage Δ
apps/skit/src/sample_discovery.rs 100.00% <100.00%> (ø)
apps/skit/src/samples.rs 91.39% <100.00%> (+0.48%) ⬆️
crates/api/src/lib.rs 99.22% <ø> (ø)
crates/api/src/yaml/compiler.rs 96.81% <100.00%> (+0.04%) ⬆️
crates/api/src/yaml/mod.rs 100.00% <ø> (ø)
...rc/components/converter/TemplateSelector.styles.ts 100.00% <100.00%> (ø)
ui/src/utils/jsonSchema.ts 72.78% <100.00%> (ø)
ui/src/utils/samplePipelineOrdering.ts 100.00% <100.00%> (ø)
ui/src/components/converter/TemplateSelector.tsx 97.00% <97.53%> (-1.51%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

streamkit-devin and others added 3 commits May 30, 2026 16:42
Grouped scenario cards render each variant as a radio pill whose visible
label is the short variant name, so E2E specs that clicked a sample by its
full name could no longer select it (colorbars and webcam-PiP families).

Set each variant pill's accessible name to the full sample name and add a
selectPipelineTemplate() helper that selects via the radio role for grouped
samples and falls back to the name text for ungrouped cards.

Signed-off-by: streamkit-devin <devin@streamkit.dev>
- Drop variant/category override from video_moq_colorbars so the
  software sample stays the canonical group representative and shares the
  derived Video Encoding category with its codec siblings.
- expandTerm: match synonyms only on exact or >=3-char prefix, dropping
  the reverse-substring branch that leaked short tokens (mic/cam) into
  unrelated queries (dynamic, scam).
- Give flat-card radios an explicit sample-name accessible label so the
  E2E helper selects grouped and ungrouped cards through one role lookup.
- Pass typed Input/OutputType into discovery derivation, removing the
  hand-rolled snake_case glue and dead match arms.
- De-duplicate the Steps/Dag arms in parse_pipeline_metadata.
- Drop the no-op group_tokens.dedup(); reuse labelFromKey for capability
  chips; remove redundant slice; extract ScenarioHeader.

Signed-off-by: streamkit-devin <devin@streamkit.dev>
Signed-off-by: streamkit-devin <devin@streamkit.dev>
- Label the no-variant base member 'Software' instead of the opaque 'Default'.
- Acronym-aware capability labels (MoQ/MP4/MSE/RTMP/WebM/VP9) via formatCapabilityLabel.
- Give GroupCard hover + a filled selected state matching flat cards.
- Square off filter chips so they read distinctly from the rounded variant pills.
- Add a persistent 'Clear all filters' affordance plus empty-state recovery.
- Drop capability chips already covered by a shown Category facet to remove the redundant facet rows.

Signed-off-by: streamkit-devin <devin@streamkit.dev>
Add 'vad' to the capability acronym map so it renders 'VAD' not 'Vad', and
remove the explicit 'vad' tag from the vad-demo exemplar since the vad node
already derives 'voice-activity-detection' (avoids a duplicate facet chip).

Signed-off-by: streamkit-devin <devin@streamkit.dev>
- Revert the Category/Capability dedup: category is a single priority-picked
  bucket while tags are multi-valued, so dropping a capability chip whose
  category is shown removed a cross-cutting filter (e.g. compositor demos that
  also encode were unreachable from the encoding filter).
- Rename the 'Needs hardware' facet to 'Needs GPU' (the underlying tags are all
  GPU accel APIs: vaapi/nvidia/vulkan).
- Collapse the duplicate clear affordances to the single persistent
  'Clear all filters' control; drop the empty-state and hidden-hint buttons.

Signed-off-by: streamkit-devin <devin@streamkit.dev>
- Derive variant codecs from typed VideoCodec/AudioCodec enums (single
  source) instead of string-sniffing; TS uses exhaustive Record maps so a
  new codec is a compile error until a label is added.
- Only auto-derive group for system samples so unrelated user saves with
  colliding name slugs no longer collapse into one card.
- Guard two-letter language tokens to a translation context.
- Drop codec/format/transport tags from capability facets (codec is the
  variant-pill axis); derive the no-variant base pill from its codec tag.
- Tighten expandTerm so a shared token cannot pull both video families;
  precompile per-query synonym expansion once instead of per pipeline.
- Collapse duplicated Steps/Dag metadata arms; merge the identical
  ExplicitDiscovery/Discovery structs; extract GroupSection; drop trivial
  useMemo over primitives.

Signed-off-by: streamkit-devin <devin@streamkit.dev>
Carries the current Customize-editor YAML (with any edits) into the visual
Design editor via router state, which DesignView imports once node
definitions are loaded.

Signed-off-by: streamkit-devin <devin@streamkit.dev>
Signed-off-by: streamkit-devin <devin@streamkit.dev>
@staging-devin-ai-integration
Copy link
Copy Markdown
Contributor Author

Direction update from Claudio after architecture review:

Please make the sample discovery backend simpler and more explicit, not more heuristic.

Implementation direction:

  • Treat YAML discovery metadata as the source of truth for bundled/sample discovery UX.
  • Do not preserve runtime filename/kind fallback heuristics and do not add backward-compatibility behavior. Missing metadata for bundled samples should fail loudly via validation/tests rather than silently deriving a guess.
  • Remove the runtime heuristic path from apps/skit/src/sample_discovery.rs instead of continuing to patch token stripping / substring inference. No filename-derived grouping, no substring-derived category/tag inference, no canonical guessing.
  • Backfill bundled sample YAMLs with explicit metadata needed by the UI/search: at minimum category, tags, and when relevant group, variant, plus a canonical/default signal or neutral group label so the UI never picks variants[0] as the card identity.
  • It is okay to keep a lightweight concept of tags/category; StreamKit already has per-node categories. Existing node categories may be indexed as secondary searchable terms if useful, but should not decide card grouping/category.
  • For fuzzy search, build/index a backend search document from the YAML itself plus optionally resolved node metadata. The YAML/sample authoring layer should own the product semantics.
  • UI should receive resolved explicit fields and render/group directly; no UI-side canonical semantics beyond consuming the backend contract.
  • Update Backfill explicit discovery metadata on sample YAMLs; thin the derivation heuristics #551 / PR notes accordingly: this is not a thin fallback/backcompat follow-up; it is an explicit YAML-backed discovery contract.

Please keep the PR focused and avoid increasing backend complexity with a taxonomy engine. The desired outcome is less magic and fewer ways to misclassify samples.

…stics

Make sample YAML the source of truth for Convert/Stream discovery UX:
authored group/variant/canonical/category/tags/keywords replace the
runtime filename/node-kind derivation. The server emits these fields
as-authored plus a resolved, lowercased search_terms document; the UI
does plain substring matching and groups directly off canonical.

- Remove all heuristic derivation from sample_discovery.rs (filename
  tokenization, substring category/tag inference, codec sniffing).
- Add canonical/keywords to the YAML schema and SamplePipeline; emit
  search_terms (name + description + category + tags + keywords +
  flattened node kinds).
- Backfill all bundled dynamic/ + oneshot/ samples with explicit
  metadata; group near-duplicate families with one canonical member.
- Enforce the contract in CI: bundled samples must carry category+tags,
  grouped samples must have exactly one canonical and per-member
  variants, ungrouped samples must not set canonical/variant.
- UI consumes resolved fields and search_terms; SYNONYM_GROUPS and the
  variants[0] card-identity guess are gone.

Signed-off-by: streamkit-devin <devin@streamkit.dev>
- Route Convert->Design handoff through guarded handleLoadSample (unsaved-work modal + measured auto-layout), deriving name/description from the edited YAML rather than the pristine sample.
- Include the sample id in backend search_terms so slug-fragment queries match again; precompute per-template search haystacks once.
- Variant pill accessible name now matches its visible label (WCAG 2.5.3); compute facet chips over the origin-filtered set so chips never match zero items.
- Tighten the metadata contract test: reject any blank tag and duplicate variant labels within a group.
- Collapse the duplicated Steps/Dag discovery fields behind a flattened PipelineMeta struct.
- Clear the Customize/editor view when the selection is hidden by active filters; smaller facet chips in a single grouped bar.

Signed-off-by: streamkit-devin <devin@streamkit.dev>
The variant pills' accessible name now equals their visible variant label (WCAG 2.5.3), so selecting a grouped pipeline by its sample name no longer matches. Scope grouped selections to the card's variant group and click by variant label.

Signed-off-by: streamkit-devin <devin@streamkit.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants