fix(voice): gapless audio chunk playback by cjol · Pull Request #1747 · cloudflare/agents

cjol · 2026-06-12T11:33:09Z

This PR fixes audible clicks at audio chunk boundaries during agent speech in VoiceClient. Closes #1736.

Why

VoiceClient played each response chunk by starting it at currentTime and waiting for its ended event before scheduling the next. The round-trip latency (event-loop delay plus the next chunk's setup) left a few milliseconds of silence at every seam, audible as a periodic click.
We could have reduced the latency by overlapping decode and schedule, but the fundamental problem is that ended callbacks run on the main thread, not the audio clock, so there will always be some gap.
Instead, we now schedule chunks back-to-back on the audio clock itself using a playback cursor. Consecutive chunks butt together sample-tight because each starts exactly when the previous ends, regardless of main-thread latency.

Code Changes

packages/voice/src/voice-client.ts
- Replaced the single #activeSource with a #scheduledSources set and a #playbackCursor. Each incoming chunk is scheduled at Math.max(currentTime, cursor), and the cursor is advanced by the chunk duration.
- Playback now counts as active until the last scheduled source finishes, so barge-in detection (user transcript interrupt) keeps working through the scheduled tail.
- #stopPlayback stops every scheduled source, not just the current one, so an interrupt or end-call cleanly cancels the whole pipeline.
- Reset the cursor on end-call so a subsequent call starts fresh instead of picking up a stale offset.
packages/voice/src/tests/voice-client.test.ts
- Added gapless playback tests: consecutive chunk scheduling, catch-up when the cursor falls behind currentTime, interrupt stops all scheduled chunks, barge-in works through the scheduled tail, and cursor reset on end-call.

changeset-bot · 2026-06-12T11:33:14Z

🦋 Changeset detected

Latest commit: 8e4717e

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package

Name	Type
@cloudflare/voice	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no bugs or issues to report.

pkg-pr-new · 2026-06-12T11:39:39Z

Open in StackBlitz

agents

npm i https://pkg.pr.new/agents@1747

@cloudflare/ai-chat

npm i https://pkg.pr.new/@cloudflare/ai-chat@1747

@cloudflare/codemode

npm i https://pkg.pr.new/@cloudflare/codemode@1747

create-think

npm i https://pkg.pr.new/create-think@1747

hono-agents

npm i https://pkg.pr.new/hono-agents@1747

@cloudflare/shell

npm i https://pkg.pr.new/@cloudflare/shell@1747

@cloudflare/think

npm i https://pkg.pr.new/@cloudflare/think@1747

@cloudflare/voice

npm i https://pkg.pr.new/@cloudflare/voice@1747

@cloudflare/worker-bundler

npm i https://pkg.pr.new/@cloudflare/worker-bundler@1747

commit: 8e4717e

threepointone · 2026-06-19T10:25:46Z

/bonk do a deep review, look for edge cases and things we might have missed, how's the ux/dx on this?

ask-bonk

Deep Review

Good approach overall — scheduling on the audio clock instead of waiting for ended callbacks is the right fix for this class of problem. The generation-based cancellation pattern is solid and the test coverage is thorough. A few things worth discussing:

Things that look good

Core scheduling logic is correct. Math.max(ctx.currentTime, cursor) handles both the happy path (back-to-back) and the catch-up case (cursor fell behind real time) cleanly.
#isScheduling / #isPlaying separation. This is a subtle but important distinction — #isPlaying stays true while scheduled sources are still audibly playing even after the scheduling loop finishes, so barge-in detection works through the "tail." Nice.
Generation-based cancellation. The #playbackGeneration pattern already existed, and extending it to cover the new scheduling loop + onended callbacks is the right move. Multiple interrupt vectors (server playback_interrupt, user transcript, client-side RMS interrupt, endCall) all go through #stopPlayback, which is clean.
Cursor reset on #stopPlayback. Without this, a stale cursor from a previous utterance would cause the next one to schedule far in the future. Good catch.

Edge cases and concerns

1. onended callbacks from source.stop() during #stopPlayback

When #stopPlayback iterates through [...this.#scheduledSources], it calls source.stop() on each. Per the Web Audio spec, stop() dispatches the ended event. The onended handler checks generation === this.#playbackGeneration, but #stopPlayback increments #playbackGeneration before calling stop(), so the generation check will fail and the handler is effectively a no-op. This is correct! But it's subtle — worth a brief inline comment on why the generation increment must come before the stop loop.

2. Floating-point drift in cursor accumulation

Each chunk advances the cursor by audioBuffer.duration, which is a floating-point number. Over a long agent response (say 60s of audio in 0.1s chunks = 600 additions), accumulated floating-point error could theoretically cause micro-gaps or micro-overlaps. In practice this is probably negligible (IEEE 754 double precision gives ~15 significant digits, and we're adding values around 0.1), but it's worth being aware of. If it ever becomes an issue, tracking the cursor in samples rather than seconds would eliminate it.

3. No test for #isPlaying transitioning to false after all sources end

The tests verify scheduling times and interrupt behavior, but none assert that #isPlaying becomes false after all scheduled sources finish playing (i.e., after all onended callbacks fire). This is the path that would surface bugs in the !#isScheduling && #scheduledSources.size === 0 && #playbackQueue.length === 0 condition. Consider adding a test that sends a few chunks, fires all onended callbacks, and asserts the client no longer considers playback active. You can observe this indirectly through barge-in: after all sources end, RMS above the interrupt threshold should not trigger an interrupt message.

4. Race: new audio arriving between #isScheduling = false and last onended

Consider this sequence:

#processPlaybackQueue finishes, sets #isScheduling = false
Sources are still playing (#scheduledSources.size > 0), so #isPlaying stays true
New audio chunk arrives from the server
#processPlaybackQueue is called again — #isScheduling is false, so it enters the loop
Meanwhile, the last source's onended fires

This is actually fine because step 4 will set #isScheduling = true again before any await, and the onended handler checks !this.#isScheduling. Just wanted to confirm this is intentional — the interleaving is safe because JS is single-threaded and the #isScheduling = true in step 4 happens synchronously before any microtask queue processing.

5. pcm16 path creates AudioBuffer with duration from length / sampleRate

The createBuffer in the test fake returns duration: length / sampleRate, which matches the real Web Audio API. But the production code at line 896-901 creates the buffer at 16kHz while the AudioContext is at 48kHz. The browser will implicitly resample, but audioBuffer.duration is based on the buffer's own sample rate (16kHz), so the duration is correct for scheduling. Good — no issue here, just noting it's correct for a non-obvious reason.

Minor / DX notes

The changeset description is excellent — it explains both the symptom and the mechanism clearly. Consumers upgrading will understand what changed and why.
The test helper startPcm16Call() using PCM16 format is a good choice — it avoids decodeAudioData async complexity and makes the scheduling math deterministic.

Verdict

This is a solid fix. The core approach is correct, the edge cases around cancellation are handled well, and the test coverage is good. The main suggestion would be adding a test for the #isPlaying → false transition after natural playback completion (concern #3 above). The rest is minor.

ask-bonk · 2026-06-19T10:28:33Z


  #stopPlayback(): void {
-    const source = this.#activeSource;
    this.#playbackGeneration++;


Nit: worth a brief comment here explaining why the generation increment must come before the stop loop — source.stop() synchronously dispatches ended, and the handler checks generation === this.#playbackGeneration, so bumping first makes those callbacks no-ops. Without this ordering, the onended handler would set #isPlaying = false prematurely before we finish cleaning up.

ask-bonk · 2026-06-19T10:28:33Z

+
+    // Without the reset this would start at the stale 5.1 cursor.
+    expect(sources[1].startedAt).toBe(2);
+  });


Consider adding a test that verifies #isPlaying transitions to false after all onended callbacks fire (natural playback completion, no interrupt). You could observe this indirectly: send a couple of PCM chunks, fire all onended callbacks on the sources, then send RMS above the interrupt threshold and assert that no { type: "interrupt" } message is sent — because the client should no longer consider playback active.

This would exercise the !this.#isScheduling && this.#scheduledSources.size === 0 && this.#playbackQueue.length === 0 condition in the onended handler.

ask-bonk · 2026-06-19T10:28:53Z

Review posted on PR #1747. Here's a summary of the findings:

Overall verdict: Solid fix. The approach of scheduling chunks on the audio clock via a playback cursor instead of waiting for ended callbacks is correct and well-implemented.

Key findings:

Generation increment ordering in #stopPlayback is subtle but correct — the increment happens before source.stop() calls, which makes the synchronously-dispatched onended handlers no-ops. Posted an inline comment suggesting documenting this.
Missing test for natural playback completion — the tests cover interrupts and scheduling, but none verify that #isPlaying correctly transitions to false after all sources finish playing naturally (all onended callbacks fire). This is the condition guarded by !#isScheduling && #scheduledSources.size === 0 && #playbackQueue.length === 0. Posted an inline comment with a suggested test approach.
Floating-point cursor drift — over very long responses (600+ chunk additions), accumulated FP error in the cursor could theoretically cause micro-gaps. Negligible in practice but noted for awareness.
Race between new audio arrival and #isScheduling flag — analyzed the interleaving between #processPlaybackQueue completing, new chunks arriving, and onended firing. It's safe due to JS single-threaded execution model.
PCM16 at 16kHz in a 48kHz AudioContext — verified the duration calculation is correct because audioBuffer.duration is based on the buffer's sample rate, not the context's.

github run

fix(voice): gapless audio chunk playback

3b14053

chore: merge main into gapless playback branch

8e4717e

devin-ai-integration Bot reviewed Jun 12, 2026

View reviewed changes

cjol requested a review from threepointone June 15, 2026 11:25

ask-bonk Bot reviewed Jun 19, 2026

View reviewed changes

threepointone merged commit 28653b3 into main Jun 19, 2026
6 checks passed

threepointone deleted the fix/1736-gapless-audio-playback branch June 19, 2026 10:46

github-actions Bot mentioned this pull request Jun 19, 2026

Version Packages #1780

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(voice): gapless audio chunk playback#1747

fix(voice): gapless audio chunk playback#1747
threepointone merged 2 commits into
mainfrom
fix/1736-gapless-audio-playback

cjol commented Jun 12, 2026 •

edited by devin-ai-integration Bot

Loading

Uh oh!

changeset-bot Bot commented Jun 12, 2026 •

edited

Loading

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

pkg-pr-new Bot commented Jun 12, 2026

Uh oh!

threepointone commented Jun 19, 2026

Uh oh!

ask-bonk Bot left a comment

Uh oh!

ask-bonk Bot Jun 19, 2026

Uh oh!

ask-bonk Bot Jun 19, 2026

Uh oh!

ask-bonk Bot commented Jun 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cjol commented Jun 12, 2026 • edited by devin-ai-integration Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

Code Changes

Uh oh!

changeset-bot Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

pkg-pr-new Bot commented Jun 12, 2026

Uh oh!

threepointone commented Jun 19, 2026

Uh oh!

ask-bonk Bot left a comment

Choose a reason for hiding this comment

Deep Review

Things that look good

Edge cases and concerns

Minor / DX notes

Verdict

Uh oh!

ask-bonk Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

ask-bonk Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

ask-bonk Bot commented Jun 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cjol commented Jun 12, 2026 •

edited by devin-ai-integration Bot

Loading

changeset-bot Bot commented Jun 12, 2026 •

edited

Loading