fix(evmrpc): cap eth_getBlockReceipts goroutine fan-out and honor ctx.Done() in eth_subscribe logs (PLT-701)#3621
Conversation
PR SummaryLow Risk Overview
Reviewed by Cursor Bugbot for commit 6c68e44. Bugbot is set up for automated code reviews on this repo. Configure here. |
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3621 +/- ##
==========================================
- Coverage 58.99% 58.12% -0.88%
==========================================
Files 2224 2150 -74
Lines 182708 174158 -8550
==========================================
- Hits 107782 101221 -6561
+ Misses 65235 63946 -1289
+ Partials 9691 8991 -700
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
bdchatham
left a comment
There was a problem hiding this comment.
Cross-review of the two fixes. The receipts cap looks good; one blocking item on the eth_subscribe(logs) change plus a few non-blocking notes, all inline.
| // disconnects (rpcSub.Err()) or the request context is cancelled / | ||
| // its deadline expires (ctx.Done()). | ||
| select { | ||
| case <-ctx.Done(): |
There was a problem hiding this comment.
I think this one will actually break the subscription. The ctx we get here gets cancelled as soon as the eth_subscribe call returns its subscription id — the handler in the go-ethereum fork we're on (v1.15.7-sei-16) does a defer cancel() on it in handleNonBatchCall (rpc/handler.go:283) and again in startCallProc (rpc/handler.go:384). Since Logs hands back rpcSub right away, by the time we loop back to this select the ctx is already done, so the client gets the first backfill batch and then the goroutine just exits — no more logs, no error, no unsubscribe.
That's why geth's own filter subscriptions (and NewHeads right here in this file) only watch rpcSub.Err() and never the method ctx. I'd drop the ctx.Done() case and keep just rpcSub.Err() + the time.After, so it lines up with NewHeads. If we actually want a disconnect to cancel an in-flight fetch, that's a separate change — we'd need a connection-lifetime ctx, not this per-call one.
Worth noting too that the logs-subscription integration test is skipped, so CI won't catch this.
There was a problem hiding this comment.
Thanks for great feedback. I will remove the ctx.Done(). Regarding the tests passing, actually there seems to be a tests that is failing: subscribe test
which is a test that is added by @kollegian in this pr Link
Thanks to Yasin for improving the test coverage.
Summary
Part of the RPC endpoint protection workstream (Phase 0 — point fixes, PR 0d). Two independent hardening fixes in
evmrpc:eth_getBlockReceiptsgoroutine fan-out (evmrpc/block.go)eth_subscribe(logs)poll loop cancellation-aware (evmrpc/subscribe.go)Linear: PLT-701
1. Cap
eth_getBlockReceiptsgoroutine fan-outThe receipt loop previously spawned one goroutine per transaction with a bare
sync.WaitGroup, so a block containing a very large number of txs would create an unbounded number of goroutines from a single request.Replaced the manual
WaitGroup/Mutexloop with anerrgroup.Groupwhose limit is set to a hard cap:errgroup.SetLimitblocksGo()until a slot frees, so it bounds the number of live goroutines (not just concurrent work). This covers theeth_,sei_, andsei2_variants since they shareGetBlockReceipts. Underlying DB reads are already bounded by the worker-pool DB semaphore, so additional goroutines beyond this cap only add scheduler/memory pressure without improving throughput.Existing error handling is preserved (the "not found" receipt skip and first-error propagation, now via
g.Wait()); the deadif returnErr != niltail check was removed since errors flow through thereturnstatements that feed the metrics defer.2. Fix
eth_subscribe(logs)poll loop to honorctx.Done()The poll loop ended each iteration with an unconditional
time.Sleep(SleepInterval), so it kept polling and notifying long after the subscriber was gone. Replaced the sleep with a cancellation-aware select:This mirrors the existing
newHeadshandler (which selects onrpcSub.Err()) and adds thectx.Done()case so the goroutine stops promptly on disconnect or deadline expiry.Testing
go build ./evmrpc/...,go vet ./evmrpc/,gofmt -sall clean.evmrpcandevmrpc/testsSubscribe|Logs|Receipt|Blockselectors, including the multi-txTestGetBlockReceiptsintegration test that exercises the rewritten fan-out path.No new tests added: the receipts change is a refactor of an already-covered path to a stdlib-guaranteed bound, and the subscribe path has no clean unit-test seam today (
rpc.NotifierFromContextis server-internal and the existing logs-subscription integration test ist.Skip()-ed).