diff --git a/docs/planning/ralph-roadmap.html b/docs/planning/ralph-roadmap.html index bbb22fde..853006b5 100644 --- a/docs/planning/ralph-roadmap.html +++ b/docs/planning/ralph-roadmap.html @@ -32,58 +32,63 @@
Canonical orchestrator backlog (maintained by Claude). Last updated 2026-05-30 06:35. Shipped + in flight + queued. Per-turn execution detail lives in the Ralph inbox.md; version-tracked copy in-repo at docs/planning/ralph-roadmap.html.
Canonical orchestrator backlog (maintained by Claude). Last updated 2026-06-01. ARM64 / Parallels focus. Refreshed at merge milestones (not per-turn β per-turn churn caused the prior revert). Per-turn execution detail lives in the Ralph inbox.md; version-tracked copy in-repo at docs/planning/ralph-roadmap.html.
Blocked-in-syscall thread mid-execv is timer-preempted; its saved kernel frame is later restored under Ring-3 selectors β err 0x15 instruction-fetch fault at the saved kernel RIP.
Turn 28 (2026-05-30): selector-only privilege-conditional restore built clean but could not be proven (GDB timed out pre-userspace; boot-stages stalled at ARP stage-22). A broader candidate (kernel-frame restore + CR3 switch) reached userspace and cleared the original 0x15 signature, but panicked on a new fault (0x100003006a8, PML4[2] process-manager data). All code reverted β no fix retained, nothing merged. New understanding: not a selector swap β a process/thread context-sync bug (how process.main_thread vs scheduler Thread diverge/reuse across blocked-in-syscall wake/restore). Proof also gated on NB-2 (ARP stage-22).
The user-stack frame-aliasing lockup fix (PR #404) reduces but does not eliminate the fault: a clean re-verify on current main faulted 1 of 2 stress boots with a post-spawn UNHANDLED_EC β PANIC on the freshly spawned child (PID 5) after exec.
Key finding: the prior "assertion-fired 3/3, 0 crashes" proof was contaminated by an SMP-serial byte-interleaving blind spot β naive FATAL/PANIC line-grep reads 0; you must de-interleave the two CPU streams to see [FATAL] bug=UNHANDLED_EC. The fault sits near the stack/spawn path β may overlap the #406 kstack-reuse area. PR #404 held, not merged. Next: de-interleave + root-cause; a gold-master file may be required β operator signoff first. beads breenix-oia family adjacent.
Investigated 2026-05-30 (code hypothesis): heartbeat's sleep path is correct (nanosleep genuinely blocks β not a busy-loop). The 189.9% is an accounting artifact.
Boot-stages hangs at [22/252] ARP reply received β net RX reply not delivered on QEMU x86_64. Blocks the E5-1 boot-stages proof; likely a net RX interrupt/descriptor issue.
ls hangs in bsh P1 ARM64ls still hangs in the bsh shell.
Overlapping windows clip in the BWM / VirGL compositor occlusion path.
Screen stays cornflower blue ~30s before first composited frame on Parallels boot. First-paint / boot-latency investigation.
VirtIO block test failed: Block device not initialized; falls back to AHCI for ext2 root.
Investigate + classify the SOFT_LOCKUP_VIRGL failure class on Parallels. beads breenix-ha9.
Remaining AHCI timeout corridor after GICR discovery; verify whether the fix is storage-driver-only (non-gold-master) or touches gic.rs β signoff. beads breenix-xk8.
bsshd should send SSH exit-status + channel close for exec requests. beads breenix-72x.
CPU0 vtimer death on Parallels + remote-wake / resched scheduling. Fixes live in frozen gold-master timer_interrupt.rs / gic.rs / context_switch.rs β need operator signoff (this cluster burned ~a week before; investigate-only without go). beads oia, 9f1, cb7, 6f4, e43, k16, eh4.
BusyBox applet faults on the ARM64 musl TLS path (errno read from a zero TPIDR_EL0). User-facing symptom already fixed (native bls as /bin/ls); the principled TLS fix touches gold-master context-switch β signoff. beads breenix-b7u.
Process/thread context-sync bug on blocked-in-syscall wake/restore mid-execv. ARM64-focus β deprioritized. beads breenix-ql2.
x86 boot-stages hang at [22/252] ARP reply received; ARM64-focus β deprioritized (x86 boot-stages is not used as a gate).
CPU0 vtimer stops ~6 ticks into boot on Parallels; registers verified correct; diagnostics built. Separate Ralph; awaiting operator decision since 2026-05-26.
The real "window clipping". PR #382 Β· re-verified clean on a fresh multi-window boot (0 VIRTGPU_FAIL).
PR #406 Β· bitmap kstack reuse + Drop-free (leaked fork-child stacks exhausted the pool at ~90 conns); 50/50 inbound SSH with real host-key verify.
PR #405 Β· AHCI slot-0 waits made uninterruptible (SIGCHLDβEINTR was abandoning in-flight DMA); F19 serialization workaround relaxed.
PR #407 Β· exec returns EAGAIN while a live CLONE_VM sibling still holds the old CR3 (scoped stopgap; full sibling-teardown later).
PR #396 Β· core-aware denominator + single-snapshot sampling (the 189% was an accounting artifact, not a real burn).
client channel #397, publickey #398, known-hosts verify #399, host-auth #403.
PR #402 Β· trace-ring crash diagnostics.
Polling-elimination gate satisfied.
Not reproducible across 3 healthy boots; opportunistic capture only.
6/6 fresh boots reached bwm create_process_with_argv ENTRY + full ELF load β harness/classification noise, not a real wedge. beads closed.
Live counters: SUBMIT_3D climbs ~154/s under Bounce while CPU full-frame composite stays flat β already GPU-composited by merged #381. beads closed.
ls hang in bsh fixedNative bls installed as /bin/ls (the BusyBox applet musl-TLS fault is tracked separately as b7u).
Resolved as the oi6 multi-window virtio-gpu fix (#382).
Legend: P0 urgent Β· P1 Β· P2 Β· arch. NB-* = surfaced by operator 2026-05-30.
+Legend: P0 Β· P1 Β· P2 Β· signoff = gold-master / operator-gated Β· arch. Gold-master frozen files (edits require operator signoff): context_switch.rs, timer_interrupt.rs, gic.rs.