Skip to content

fix(voice): gapless audio chunk playback#1747

Merged
threepointone merged 2 commits into
mainfrom
fix/1736-gapless-audio-playback
Jun 19, 2026
Merged

fix(voice): gapless audio chunk playback#1747
threepointone merged 2 commits into
mainfrom
fix/1736-gapless-audio-playback

Conversation

@cjol

@cjol cjol commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

This PR fixes audible clicks at audio chunk boundaries during agent speech in VoiceClient. Closes #1736.

Why

  • VoiceClient played each response chunk by starting it at currentTime and waiting for its ended event before scheduling the next. The round-trip latency (event-loop delay plus the next chunk's setup) left a few milliseconds of silence at every seam, audible as a periodic click.
  • We could have reduced the latency by overlapping decode and schedule, but the fundamental problem is that ended callbacks run on the main thread, not the audio clock, so there will always be some gap.
  • Instead, we now schedule chunks back-to-back on the audio clock itself using a playback cursor. Consecutive chunks butt together sample-tight because each starts exactly when the previous ends, regardless of main-thread latency.

Code Changes

  • packages/voice/src/voice-client.ts

    • Replaced the single #activeSource with a #scheduledSources set and a #playbackCursor. Each incoming chunk is scheduled at Math.max(currentTime, cursor), and the cursor is advanced by the chunk duration.
    • Playback now counts as active until the last scheduled source finishes, so barge-in detection (user transcript interrupt) keeps working through the scheduled tail.
    • #stopPlayback stops every scheduled source, not just the current one, so an interrupt or end-call cleanly cancels the whole pipeline.
    • Reset the cursor on end-call so a subsequent call starts fresh instead of picking up a stale offset.
  • packages/voice/src/tests/voice-client.test.ts

    • Added gapless playback tests: consecutive chunk scheduling, catch-up when the cursor falls behind currentTime, interrupt stops all scheduled chunks, barge-in works through the scheduled tail, and cursor reset on end-call.

Open in Devin Review

@changeset-bot

changeset-bot Bot commented Jun 12, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: 8e4717e

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@cloudflare/voice Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no bugs or issues to report.

Open in Devin Review

@pkg-pr-new

pkg-pr-new Bot commented Jun 12, 2026

Copy link
Copy Markdown

Open in StackBlitz

agents

npm i https://pkg.pr.new/agents@1747

@cloudflare/ai-chat

npm i https://pkg.pr.new/@cloudflare/ai-chat@1747

@cloudflare/codemode

npm i https://pkg.pr.new/@cloudflare/codemode@1747

create-think

npm i https://pkg.pr.new/create-think@1747

hono-agents

npm i https://pkg.pr.new/hono-agents@1747

@cloudflare/shell

npm i https://pkg.pr.new/@cloudflare/shell@1747

@cloudflare/think

npm i https://pkg.pr.new/@cloudflare/think@1747

@cloudflare/voice

npm i https://pkg.pr.new/@cloudflare/voice@1747

@cloudflare/worker-bundler

npm i https://pkg.pr.new/@cloudflare/worker-bundler@1747

commit: 8e4717e

@cjol cjol requested a review from threepointone June 15, 2026 11:25
@threepointone

Copy link
Copy Markdown
Contributor

/bonk do a deep review, look for edge cases and things we might have missed, how's the ux/dx on this?

@ask-bonk ask-bonk Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review

Good approach overall — scheduling on the audio clock instead of waiting for ended callbacks is the right fix for this class of problem. The generation-based cancellation pattern is solid and the test coverage is thorough. A few things worth discussing:

Things that look good

  1. Core scheduling logic is correct. Math.max(ctx.currentTime, cursor) handles both the happy path (back-to-back) and the catch-up case (cursor fell behind real time) cleanly.

  2. #isScheduling / #isPlaying separation. This is a subtle but important distinction — #isPlaying stays true while scheduled sources are still audibly playing even after the scheduling loop finishes, so barge-in detection works through the "tail." Nice.

  3. Generation-based cancellation. The #playbackGeneration pattern already existed, and extending it to cover the new scheduling loop + onended callbacks is the right move. Multiple interrupt vectors (server playback_interrupt, user transcript, client-side RMS interrupt, endCall) all go through #stopPlayback, which is clean.

  4. Cursor reset on #stopPlayback. Without this, a stale cursor from a previous utterance would cause the next one to schedule far in the future. Good catch.

Edge cases and concerns

1. onended callbacks from source.stop() during #stopPlayback

When #stopPlayback iterates through [...this.#scheduledSources], it calls source.stop() on each. Per the Web Audio spec, stop() dispatches the ended event. The onended handler checks generation === this.#playbackGeneration, but #stopPlayback increments #playbackGeneration before calling stop(), so the generation check will fail and the handler is effectively a no-op. This is correct! But it's subtle — worth a brief inline comment on why the generation increment must come before the stop loop.

2. Floating-point drift in cursor accumulation

Each chunk advances the cursor by audioBuffer.duration, which is a floating-point number. Over a long agent response (say 60s of audio in 0.1s chunks = 600 additions), accumulated floating-point error could theoretically cause micro-gaps or micro-overlaps. In practice this is probably negligible (IEEE 754 double precision gives ~15 significant digits, and we're adding values around 0.1), but it's worth being aware of. If it ever becomes an issue, tracking the cursor in samples rather than seconds would eliminate it.

3. No test for #isPlaying transitioning to false after all sources end

The tests verify scheduling times and interrupt behavior, but none assert that #isPlaying becomes false after all scheduled sources finish playing (i.e., after all onended callbacks fire). This is the path that would surface bugs in the !#isScheduling && #scheduledSources.size === 0 && #playbackQueue.length === 0 condition. Consider adding a test that sends a few chunks, fires all onended callbacks, and asserts the client no longer considers playback active. You can observe this indirectly through barge-in: after all sources end, RMS above the interrupt threshold should not trigger an interrupt message.

4. Race: new audio arriving between #isScheduling = false and last onended

Consider this sequence:

  1. #processPlaybackQueue finishes, sets #isScheduling = false
  2. Sources are still playing (#scheduledSources.size > 0), so #isPlaying stays true
  3. New audio chunk arrives from the server
  4. #processPlaybackQueue is called again — #isScheduling is false, so it enters the loop
  5. Meanwhile, the last source's onended fires

This is actually fine because step 4 will set #isScheduling = true again before any await, and the onended handler checks !this.#isScheduling. Just wanted to confirm this is intentional — the interleaving is safe because JS is single-threaded and the #isScheduling = true in step 4 happens synchronously before any microtask queue processing.

5. pcm16 path creates AudioBuffer with duration from length / sampleRate

The createBuffer in the test fake returns duration: length / sampleRate, which matches the real Web Audio API. But the production code at line 896-901 creates the buffer at 16kHz while the AudioContext is at 48kHz. The browser will implicitly resample, but audioBuffer.duration is based on the buffer's own sample rate (16kHz), so the duration is correct for scheduling. Good — no issue here, just noting it's correct for a non-obvious reason.

Minor / DX notes

  • The changeset description is excellent — it explains both the symptom and the mechanism clearly. Consumers upgrading will understand what changed and why.
  • The test helper startPcm16Call() using PCM16 format is a good choice — it avoids decodeAudioData async complexity and makes the scheduling math deterministic.

Verdict

This is a solid fix. The core approach is correct, the edge cases around cancellation are handled well, and the test coverage is good. The main suggestion would be adding a test for the #isPlaying → false transition after natural playback completion (concern #3 above). The rest is minor.


#stopPlayback(): void {
const source = this.#activeSource;
this.#playbackGeneration++;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: worth a brief comment here explaining why the generation increment must come before the stop loop — source.stop() synchronously dispatches ended, and the handler checks generation === this.#playbackGeneration, so bumping first makes those callbacks no-ops. Without this ordering, the onended handler would set #isPlaying = false prematurely before we finish cleaning up.


// Without the reset this would start at the stale 5.1 cursor.
expect(sources[1].startedAt).toBe(2);
});

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a test that verifies #isPlaying transitions to false after all onended callbacks fire (natural playback completion, no interrupt). You could observe this indirectly: send a couple of PCM chunks, fire all onended callbacks on the sources, then send RMS above the interrupt threshold and assert that no { type: "interrupt" } message is sent — because the client should no longer consider playback active.

This would exercise the !this.#isScheduling && this.#scheduledSources.size === 0 && this.#playbackQueue.length === 0 condition in the onended handler.

@ask-bonk

ask-bonk Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Review posted on PR #1747. Here's a summary of the findings:

Overall verdict: Solid fix. The approach of scheduling chunks on the audio clock via a playback cursor instead of waiting for ended callbacks is correct and well-implemented.

Key findings:

  1. Generation increment ordering in #stopPlayback is subtle but correct — the increment happens before source.stop() calls, which makes the synchronously-dispatched onended handlers no-ops. Posted an inline comment suggesting documenting this.

  2. Missing test for natural playback completion — the tests cover interrupts and scheduling, but none verify that #isPlaying correctly transitions to false after all sources finish playing naturally (all onended callbacks fire). This is the condition guarded by !#isScheduling && #scheduledSources.size === 0 && #playbackQueue.length === 0. Posted an inline comment with a suggested test approach.

  3. Floating-point cursor drift — over very long responses (600+ chunk additions), accumulated FP error in the cursor could theoretically cause micro-gaps. Negligible in practice but noted for awareness.

  4. Race between new audio arrival and #isScheduling flag — analyzed the interleaving between #processPlaybackQueue completing, new chunks arriving, and onended firing. It's safe due to JS single-threaded execution model.

  5. PCM16 at 16kHz in a 48kHz AudioContext — verified the duration calculation is correct because audioBuffer.duration is based on the buffer's sample rate, not the context's.

github run

@threepointone threepointone merged commit 28653b3 into main Jun 19, 2026
6 checks passed
@threepointone threepointone deleted the fix/1736-gapless-audio-playback branch June 19, 2026 10:46
@github-actions github-actions Bot mentioned this pull request Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[voice] audible click at every audio chunk seam (w/ working fix)

2 participants