feat(core): implement `core/scheduler` by emlautarom1 · Pull Request #466 · NethermindEth/pluto

emlautarom1 · 2026-06-01T16:45:16Z

Closes #176

Uses an Actor model to implement the core/scheduler module, solving some concurrency hacks that the original implementation had.

Metrics have been ported using the existing Vise patterns but are untested.

- Add `Scheduler` struct (WIP) - Add `new_slot_ticker`

- Fix duplicated code

- `valcache` is required by scheduler - Fixes cyclical dependency issue

- Use `valcache` to get the list of active validators.

- Manages internal state of the scheduler - Stored in a `Mutex` for thread safety

- TODO's regarding concurrent access while resolving duties

- Try to reuse existing infrastructure

- Preserve only Actor API

…duler

- Remove generics from `DutyDefinitionSet`

emlautarom1 · 2026-06-11T22:18:35Z

@claude Re-review this PR. Check for your previous review and recent comments.

claude · 2026-06-11T22:18:54Z

Claude finished @emlautarom1's task in 2m 15s —— View job

Re-review: `core/scheduler`

Re-checked the branch against my earlier review and the discussion since. Good progress — several findings are now resolved or improved. Below is the delta, grouped by status.

✅ Addressed since last review

Lagging subscribers now log (scheduler.rs:132-139, 164-173) — the silent-drop regression is gone; a lag now emits tracing::error! before terminating. The remaining question (return a handle / don't terminate at all) is now a design choice, see below.
Reorg handler logs (handle_chain_reorg, scheduler.rs:331-343) — no longer a silent early-return; it logs "Chain reorg event handled, duties trimmed".
Ticker send is now cancellable (scheduler.rs:649-652) — the channel send is wrapped in select! against ct.cancelled(). (But see Update README #2 below — the sleep still isn't.)
Async callbacks — subscribe_slot/subscribe_duty now accept Fn -> Future, matching the discussion. Tradeoff analysis below.
Test coverage — get_duty_definition is now driven through the SchedulerHandle path (get_duty_success_then_reorg_then_get_duty_fails), and the slot-15 mid-epoch case exercises the last_in_epoch pre-resolution branch.

⚠️ Still open

1. (Major) get_duty_definition: 100 ms total timeout + network call inside the actor. Still unresolved and still the main concern.

SchedulerActor::get_duty_definition (scheduler.rs:352) calls self.client.fetch_slots_config().await? on every query, behind the handle's Duration::from_millis(100) (scheduler.rs:269).
Crucial parity detail I want to underline: in Charon the 100 * time.Millisecond is a poll interval inside for s.isResolvingEpoch(epoch) { … time.Sleep(100ms) } (scheduler.go:185-191), not a deadline — and FetchSlotsConfig is memoized in eth2wrap, so it isn't a per-call round-trip. Pluto converts a poll interval into a hard total timeout and puts a live beacon call on the hot path. Under a slow beacon node this returns TimeoutError where Charon would wait.
Fix together: cache slots_per_epoch on the actor (it's already fetched in new_slot_ticker) so the query is synchronous; then either drop the timeout or let the caller pass one.
SchedulerError::EpochNotResolved (scheduler.rs:62-66) still drops the duty field that Charon logs (scheduler.go:194-195).

Fix this →

2. new_slot_ticker inter-slot sleep still ignores cancellation (scheduler.rs:631-636). You wrapped the send but the tokio::time::sleep(wait).await before it is bare, so a shutdown mid-slot still blocks the ticker task for up to a full slot (~12 s). Wrap the sleep in the same select!:

tokio::select! {
    _ = ct.cancelled() => break,
    _ = tokio::time::sleep(wait) => {}
}

Fix this →

3. Reorg featureset gate still punted to the caller. with_chain_reorgs (scheduler.rs:181-184) keeps the // NOTE: The SSE feature check should be done by the caller comment. Charon gates on featureset.Enabled(featureset.SSEReorgDuties) inside the handler and warns when disabled (scheduler.go:142-153). pluto-featureset is already a dependency of this crate — consider matching Charon and checking it in-handler, or at minimum document the requirement loudly on the public method (a doc-comment, not an inline note).

5. handle_chain_reorg is still pub async fn on the private SchedulerActor (scheduler.rs:331). Drop pub.

6. Redundant clone in build (scheduler.rs:208) — new_slot_ticker(&client.clone(), …) still clones then borrows; new_slot_ticker takes &client. Pass &client.

7. validator_status reset is still the O(total-seen) scan (scheduler.rs:690-696) with no TODO. Non-blocking, but worth a note about the HashMap<PubKey, String> last-status optimization.

8. Nits: enum SchedulerMessage is still wedged against the impl Default block with no blank line (scheduler.rs:236-237). fast_backoff still carries the // TODO: Duplicated from crates/p2p/src/bootnode.rs — please file a follow-up to DRY it.

9. Test gap: no slow-path delay_slot_offset test — every test uses test_past_slot, so the deadline has always elapsed and the with_cancellation_token_owned race is never actually exercised with a live wait. One test with a near-future slot + cancellation would close this.

On the open design questions

backon dependency (varex83: "move it back to dev-dependencies"): it can't move — fast_backoff/default_backoff and the .retry(...) calls in wait_chain_start/wait_beacon_sync are production paths, not tests. Keeping it in [dependencies] is correct.
Callbacks vs returning the Receiver (your comment): I'd lean toward returning the broadcast::Receiver from subscribe_*. It pushes the await-each-vs-spawn-per-callback and cancellation decisions to the caller (per your own analysis), removes the lag-terminates-subscription policy from the scheduler entirely, and sidesteps the "should the actor outlive a slow callback" question. The callback form bakes one policy in for everyone; the channel form is strictly more flexible and is the idiomatic Rust shape here. If you keep callbacks, document the await-each-in-order behavior (and the lag-terminates semantics) on the public methods.
await-each vs spawn-per-callback: Charon runs subscribers without ordering guarantees, but I wouldn't treat "Charon does X" as decisive here — ordering-preserving back-pressure (current behavior) is the safer default for duty emission. Returning the Receiver makes this moot.

Summary

Solid iteration — the observability regressions (#3 logging, #4 logging) and the test/handle coverage are meaningfully better. #1 remains the blocker (hard 100 ms timeout + per-call beacon round-trip is a real behavioral divergence from Charon's memoized + poll-loop semantics). #2 (mid-slot sleep cancellation) and the pub/clone/blank-line cleanups are quick wins. Recommend addressing #1 and #2 before merge; the rest can be a cleanup pass.

Read current scheduler.rs + metrics.rs
Re-check each prior finding
Review recent changes (async callbacks, logging, tests)
Cross-reference Charon for deltas
Post updated review
• Branch: emlautarom1/core-scheduler

- Has `ValidatorCache` as a field

- Make `BeaconNodeClient` always have a valid `ValidatorCache` on construction

…duler

emlautarom1 added 30 commits May 25, 2026 13:55

Implement initial scheduler module

a620362

- Add `Scheduler` struct (WIP) - Add `new_slot_ticker`

Add wait_chain_start

8189f08

- Fix duplicated code

Add wait_beacon_sync

67c47fd

Move valcache from app to core

3d19f19

- `valcache` is required by scheduler - Fixes cyclical dependency issue

Add resolve_active_validators

e60be59

- Use `valcache` to get the list of active validators.

Implement run (WIP)

36ed2ba

Use broadcast channel for emitCoreSlot

218be9f

Add Inner

a1b62b4

- Manages internal state of the scheduler - Stored in a `Mutex` for thread safety

Add schedule_slot (WIP)

efec692

Implement schedule_slot (WIP)

55a2030

Ignore dead_code during development

ef1e932

Cleanup TODOs

6015466

Resolve Attester duties

0ccdd1e

Resolve Proposer duties

3f70aaf

Resolve Sync Committee duties

624cf49

Add get_duty_definition

b832067

- TODO's regarding concurrent access while resolving duties

Add handle_chain_reorg

0b9bffc

Cleanup parsing errors

5ecb4d1

- Try to reuse existing infrastructure

Port to Actor model (WIP)

28d0b85

Inline inner

601d8a9

Remove old Scheduler code

6117781

- Preserve only Actor API

Fix epoch trimmed logic

6f45227

Use saturating_sub to prevent underflow

012522f

Check for resolved then trimmed

8a6007a

Allow early shutdown when waiting for start

a2e8f68

Correctly wait until the slot offset

759628c

Add guard for lagging subscribers

f476267

Add metrics support

76510f3

Add original Charon timeout logic

e6158bc

Add Default to builder

a1020a5

emlautarom1 added 2 commits June 11, 2026 18:48

Merge remote-tracking branch 'origin/main' into emlautarom1/core-sche…

d802083

…duler

Fix merge conflicts

1514d71

- Remove generics from `DutyDefinitionSet`

emlautarom1 and others added 6 commits June 12, 2026 16:17

Cancellable inter-slot sleep

c6ec545

Remove timeout from get_duty_definition

f468161

Reduce visibility of handle_chain_reorg

5fc9431

Log duty on EpochNotResolved error

b58d3e7

Nits

abc0637

Test cancellation during slot offset wait

6a25109

emlautarom1-agent Bot mentioned this pull request Jun 12, 2026

Cache lifetime-constant Beacon Node API responses #481

Open

Merge branch 'main' into emlautarom1/core-scheduler

4c8ecb1

This was referenced Jun 15, 2026

Implement core/scheduler #176

Closed

[Tracker] Pluto ↔ Charon port — master tracker #394

Open

[Epic] Track A — Duty pipeline + consensus #395

Open

Implement cmd/run — main node binary #403

Open

iamquang95 reviewed Jun 15, 2026

View reviewed changes

Comment thread crates/core/src/scheduler.rs Outdated

iamquang95 reviewed Jun 15, 2026

View reviewed changes

Comment thread crates/core/src/types.rs Outdated

Comment thread crates/core/src/scheduler.rs Outdated

Comment thread crates/core/src/scheduler.rs Outdated

Comment thread crates/core/src/scheduler.rs Outdated

emlautarom1 added 7 commits June 15, 2026 08:47

Prefer HashSet over HashMap

fb2fb47

Extract channel actor buffer size constant

0ab5c6d

Fix typo

7ed543e

Prefer TryFrom over TryInto

6c88e2c

Use BeaconNodeClient instead of EthBeaconNodeApiClient

c01ca57

- Has `ValidatorCache` as a field

Fix missing ValidatorCache issues

e603a60

- Make `BeaconNodeClient` always have a valid `ValidatorCache` on construction

Merge remote-tracking branch 'origin/main' into emlautarom1/core-sche…

01f16a1

…duler

varex83 approved these changes Jun 15, 2026

View reviewed changes

emlautarom1 requested a review from iamquang95 June 15, 2026 18:31

iamquang95 approved these changes Jun 16, 2026

View reviewed changes

emlautarom1 merged commit 8b8d317 into main Jun 16, 2026
16 checks passed

emlautarom1 deleted the emlautarom1/core-scheduler branch June 16, 2026 15:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(core): implement `core/scheduler`#466

feat(core): implement `core/scheduler`#466
emlautarom1 merged 58 commits into
mainfrom
emlautarom1/core-scheduler

emlautarom1 commented Jun 1, 2026 •

edited

Loading

Uh oh!

emlautarom1 commented Jun 11, 2026

Uh oh!

claude Bot commented Jun 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

emlautarom1 commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

emlautarom1 commented Jun 11, 2026

Uh oh!

claude Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Re-review: core/scheduler

✅ Addressed since last review

⚠️ Still open

On the open design questions

Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

emlautarom1 commented Jun 1, 2026 •

edited

Loading

claude Bot commented Jun 11, 2026 •

edited

Loading

Re-review: `core/scheduler`