performance: Make sure all alarms are being set with allowUnconfirmed… by gabivlj · Pull Request #228 · cloudflare/containers

gabivlj · 2026-06-18T15:05:06Z

… so we do not block on storage, and all states don't reach to storage on hot-paths

Reaching storage would make container startups slower; let's give in on a more greedy approach where if storage were to fail we get an error later after a container has been potentially started.

ask-bonk · 2026-06-18T15:05:48Z

APIError: Invalid Anthropic API Key

github run

ask-bonk · 2026-06-18T15:05:50Z

@gabivlj Bonk workflow was cancelled.

View workflow run · To retry, trigger Bonk again.

pkg-pr-new · 2026-06-18T15:06:12Z

Open in StackBlitz

npm i https://pkg.pr.new/@cloudflare/containers@228

commit: 0c4b4ed

… so we do not block on storage, and all states don't reach to storage on hot-paths Reaching storage would make container startups slower; let's give in on a more greedy approach where if storage were to fail we get an error later after a container has been potentially started.

ask-bonk · 2026-06-18T15:07:21Z

APIError: Invalid Anthropic API Key

github run

ask-bonk · 2026-06-18T15:10:01Z

APIError: Invalid Anthropic API Key

github run

ask-bonk · 2026-06-18T15:16:13Z

OK renewActivityTimeout only sets sleepAfterMs in memory. The alarm is what actually checks against this. So if the constructor runs and container isn't running, we don't need an alarm. But if container IS running, scheduleNextAlarm is needed to enforce sleepAfter expiry. Good.

But wait — the new code does call this.renewActivityTimeout() unconditionally even when container isn't running. That's a minor inefficiency (the value is set but never used). Not a bug though.

There IS one concern though: When start() is called externally and container wasn't running, after this.container.start() the path now does await this.scheduleNextAlarm() — that's correct.

Overall, the PR looks reasonable. Let me identify the actionable issues:

Potential issue with start() retries leaving stale pendingStoppedEvent — if doStartContainer loops through retries (tries > 0), capturePendingStoppedEvent could overwrite the pendingStoppedEvent on each iteration. But it only captures when !containerWasRunning. On retry iterations after handleError, the state should already be 'stopped' or 'stopped_with_code', so it'd capture stopped_with_code. This might cause duplicate onStop calls if onStop already fired for a prior monitor death. Actually, callOnStop has onStopCalled guard but it's reset after the call. Hmm.

Let me check: on retry, would there be a previous monitor that had .catch'd and set state to stopped? If so, when monitor sets state to stopped (clean), capture would NOT trigger (state is 'stopped', not 'healthy'/'running'/'stopped_with_code'). If monitor's exit code path called setStoppedWithCode, capture WOULD trigger. So onStop would fire after retry start - this could be correct (we want onStop for the prior exit). But what if onStop was already going to fire via syncPendingStoppedEvents in next alarm? Now drain fires it, and the alarm's syncPendingStoppedEvents would skip (state has been reset by setStoppedIfUnchanged).

This seems OK, but it's complex enough that I think there could be edge cases. However, I don't see a clear bug to point to.

Removed await this.ctx.storage.sync() after setAlarm: Combined with allowUnconfirmed: true, if the runtime crashes mid-write the alarm is lost. This is documented in the PR description as an explicit tradeoff.
The alarm() override on Container itself is NOT in scope of the "no override" rule for subclasses, it's the internal handler. The PR doesn't change subclass override rules.

Overall, I don't see clear bugs. The PR has a clear changeset, tests are updated, and the optimization is sound. The behavior changes (deferred onStop, conditional initial scheduleNextAlarm) are subtle but defensible.

LGTM

github run

ask-bonk

I'm Bonk, and I've done a quick review of your PR.

Defers storage/alarm writes and outbound-interception setup until after this.container.start() so cold starts stop blocking on storage round-trips, and switches the persisted-state writes to allowUnconfirmed: true.

Few things worth a look before this lands:

Persisted outbound configuration is no longer restored on DO hydration when the container is already running. The old constructor unconditionally called restoreOutboundConfiguration() and then (if this.container.running) ran applyOutboundInterception() inside blockConcurrencyWhile. The new constructor calls updateUsingInterception() without the persisted config and only calls restoreOutboundConfigurationOnce() inside doStartContainer -- and only when !containerWasRunning. So when a DO is evicted/rehydrated while its container is still running, this.outboundHandlerOverride / this.outboundByHostOverrides are never repopulated from storage. A subsequent setOutboundByHost("example.com", ...) call will then update only that one host and applyOutboundInterception will rebuild the props from the (empty) instance fields, effectively wiping every other runtime override that was previously persisted. This is a behavior regression that isn't covered by the new tests.
onStart() no longer runs under blockConcurrencyWhile. Both start() and startAndWaitForPorts() previously wrapped await this.onStart() in blockConcurrencyWhile, which queued all incoming requests until onStart resolved. After this PR, the input gate is released on every await inside user-supplied onStart() handlers, so other requests can start being served before onStart() completes. This is a public lifecycle-hook semantics change that subclasses likely rely on -- worth either preserving the old behavior or calling it out explicitly in the changeset.
onStop for the prior run now fires after onStart for the new run. Previously syncPendingStoppedEvents() ran before start() inside startAndWaitForPorts. The PR removes that call and instead drains the pending stopped event in the finally of startContainerIfNotRunning, which runs after onStart() has already fired. So a subclass that observed onStop -> onStart ordering between restarts will now see onStart -> onStop. Also worth calling out in the changeset, since the changeset only mentions storage/alarm deferral.
Tests don't cover the rehydrate-with-running-container path. The new cold start should not touch storage before container.start test is good, but there's no test for "DO wakes, this.container.running === true, persisted outbound overrides should still be honored." That's exactly the regression in #1 above. Worth adding before this ships.

No blocking objections beyond #1, which I do think is a real bug; the rest are behavior changes that should be documented in the changeset or guarded behind an opt-in.

ask-bonk · 2026-06-18T16:36:56Z

-      )
-    `;
+    this.renewActivityTimeout();
+    this.updateUsingInterception();


This constructor path no longer calls restoreOutboundConfiguration() and no longer calls applyOutboundInterception() when this.container.running is true on hydration. The old code did both inside blockConcurrencyWhile.

Concrete consequence: when the DO is evicted while the container is still running, on the next hydration the instance fields this.outboundHandlerOverride and this.outboundByHostOverrides are not repopulated from OUTBOUND_CONFIGURATION_KEY. A subsequent setOutboundByHost(host, ...) call then sees an empty outboundByHostOverrides, sets just that one host, and applyOutboundInterception rebuilds the proxy props from the (now-truncated) instance state. Every other persisted runtime override is silently dropped.

Suggested fix: restore the persisted config in the constructor (it's a sync kv.get, so it doesn't reintroduce the cold-start storage round-trip you're trying to avoid), and pass it into updateUsingInterception:

Suggested change

this.updateUsingInterception();

const persistedOutboundConfiguration = this.restoreOutboundConfigurationOnce();

this.renewActivityTimeout();

this.updateUsingInterception(persistedOutboundConfiguration);

ask-bonk · 2026-06-18T16:36:56Z

-    await this.ctx.blockConcurrencyWhile(async () => {
-      await this.onStart();
-    });
+    await this.onStart();


Dropping blockConcurrencyWhile around onStart() changes the public lifecycle contract: the input gate is now released on every await inside a user-supplied onStart handler, so other requests can begin executing before onStart resolves. Previously they were queued.

If this is intentional, please call it out in the changeset (currently it only mentions storage/alarm deferral). If not, restoring the wrapper here costs nothing -- the storage writes you're trying to avoid already happened before this point:

Suggested change

await this.onStart();

await this.ctx.blockConcurrencyWhile(async () => {

await this.onStart();

});

ask-bonk · 2026-06-18T16:36:56Z

-    });
+    // All ports are ready
+    await this.state.setHealthy();
+    await this.onStart();


Same blockConcurrencyWhile removal here as in start() -- worth keeping consistent with whatever you decide for the other call site, and worth mentioning in the changeset. Note also that this code path now fires onStart() before drainPendingStoppedEvent() (which runs in the finally of startContainerIfNotRunning), so subclasses that previously saw onStop -> onStart ordering across a restart will now see onStart -> onStop.

ask-bonk · 2026-06-18T16:36:56Z

+        }
+        await this.capturePendingStoppedEvent(containerWasRunning);
        await this.state.setRunning();
+        await this.scheduleNextAlarm();


Two related observations on this block:

restoreOutboundConfigurationOnce() is the only place the persisted outbound config is read post-PR. Because it's gated by !containerWasRunning, a hydrated DO that finds an already-running container never restores the persisted overrides into instance fields (see the constructor comment).

capturePendingStoppedEvent(containerWasRunning) is always called with false here (we're inside the if (!containerWasRunning) branch), so the parameter is effectively dead. Minor, but consider dropping the argument or inlining the check for clarity.

ask-bonk · 2026-06-18T16:37:02Z

Review posted on PR #228. Key findings:

Likely regression: persisted outbound config isn't restored when DO hydrates with an already-running container — runtime overrides like setOutboundByHost will silently wipe other persisted overrides.
Lifecycle semantics change: onStart() no longer runs under blockConcurrencyWhile, so other requests can interleave with user-supplied onStart handlers.
Hook ordering change: onStop for a previous run now fires after onStart for the new run.
Test gap: no coverage for the rehydrate-with-running-container path.

Suggestions were posted as inline GitHub suggestion blocks.

github run

…ent new trade-offs this means we need to bump to 0.4.0 as there is some breaking changes in behaviour.

ask-bonk · 2026-06-19T14:57:56Z

LGTM

github run

gabivlj requested a review from a team as a code owner June 18, 2026 15:05

gabivlj force-pushed the gv/no-storage branch from 7b40b16 to 9d40b5f Compare June 18, 2026 15:06

ci: fix bonk

b529c37

gabivlj force-pushed the gv/no-storage branch from 9e4dd08 to b529c37 Compare June 18, 2026 15:13

perf: get rid of any storage operation before calling start()

c39e533

ask-bonk Bot reviewed Jun 18, 2026

View reviewed changes

simplify outbound setups, do not use storage for outboud hooks, docum…

0c4b4ed

…ent new trade-offs this means we need to bump to 0.4.0 as there is some breaking changes in behaviour.

Conversation

gabivlj commented Jun 18, 2026

Uh oh!

ask-bonk Bot commented Jun 18, 2026

Uh oh!

ask-bonk Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pkg-pr-new Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ask-bonk Bot commented Jun 18, 2026

Uh oh!

ask-bonk Bot commented Jun 18, 2026

Uh oh!

ask-bonk Bot commented Jun 18, 2026

Uh oh!

ask-bonk Bot left a comment

Choose a reason for hiding this comment

Uh oh!

ask-bonk Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

ask-bonk Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

ask-bonk Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

ask-bonk Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

ask-bonk Bot commented Jun 18, 2026

Uh oh!

ask-bonk Bot commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ask-bonk Bot commented Jun 18, 2026 •

edited

Loading

pkg-pr-new Bot commented Jun 18, 2026 •

edited

Loading