Skip to content

Join worker vCPU threads before tearing down guest memory#85

Open
Max042004 wants to merge 1 commit into
sysprog21:mainfrom
Max042004:fix-worker-join-teardown
Open

Join worker vCPU threads before tearing down guest memory#85
Max042004 wants to merge 1 commit into
sysprog21:mainfrom
Max042004:fix-worker-join-teardown

Conversation

@Max042004
Copy link
Copy Markdown
Collaborator

@Max042004 Max042004 commented Jun 6, 2026

On exit_group the main thread leaves vcpu_run_loop as soon as it observes
the exit-group flag and proceeds to cleanup_main_resources(), which unmaps
the guest slab via guest_destroy(). Sibling vCPU threads may still be
mid-iteration in their own run loops (e.g. in shim_globals_recompute_attention,
which touches guest memory). A worker that reads the slab after the main
thread frees it faults at the host level and the elfuse process dies with
SIGSEGV, so a guest that requested exit_group(0) is reported as exit 139.

This was masked until now because workloads that exercise it (multi-threaded
JVMs) crashed earlier; with the fault-delivery fix javac runs to completion
and reaches the exit_group teardown, exposing the race.

Have the main thread call thread_join_workers() after vcpu_run_loop()
returns and before any teardown. It waits for the workers to wind down
(they respond to the hv_vcpus_exit() that exit_group already issued) and is
a no-op once they have. javac now exits 0.

(cherry picked from commit e2f63eab9575439bd2f900954c8e095986d321b8)


Summary by cubic

Join worker vCPU threads before tearing down guest memory to prevent host SIGSEGVs and preserve correct exit codes on exit_group. This fixes a race seen in multithreaded workloads (e.g., JVM), where javac now exits 0.

  • Bug Fixes
    • Call thread_join_workers() after vcpu_run_loop() and before guest teardown to avoid workers touching freed guest memory.
    • Add include for runtime/thread.h.

Written for commit 7570bee. Summary will update on new commits.

Review in cubic

On exit_group the main thread leaves vcpu_run_loop as soon as it observes
the exit-group flag and proceeds to cleanup_main_resources(), which unmaps
the guest slab via guest_destroy(). Sibling vCPU threads may still be
mid-iteration in their own run loops (e.g. in shim_globals_recompute_attention,
which touches guest memory). A worker that reads the slab after the main
thread frees it faults at the host level and the elfuse process dies with
SIGSEGV, so a guest that requested exit_group(0) is reported as exit 139.

This was masked until now because workloads that exercise it (multi-threaded
JVMs) crashed earlier; with the fault-delivery fix javac runs to completion
and reaches the exit_group teardown, exposing the race.

Have the main thread call thread_join_workers() after vcpu_run_loop()
returns and before any teardown. It waits for the workers to wind down
(they respond to the hv_vcpus_exit() that exit_group already issued) and is
a no-op once they have. javac now exits 0.

(cherry picked from commit e2f63eab9575439bd2f900954c8e095986d321b8)
cubic-dev-ai[bot]

This comment was marked as resolved.

@jserv
Copy link
Copy Markdown
Contributor

jserv commented Jun 6, 2026

(cherry picked from commit e2f63eab9575439bd2f900954c8e095986d321b8)

Don't mention this as this commit never existed in public.

Comment thread src/main.c
* fault on freed guest memory and crash the host with SIGSEGV, masking the
* real exit code. thread_join_workers() is a no-op once the workers have
* already wound down (the common single-threaded case). */
thread_join_workers();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ordering vs gdb_stub_shutdown() at the next line leaves a residual race. gdb_stub_shutdown() (gdbstub.c:1435-1440) broadcasts resume_cond to release any GDB-paused workers so they can exit. A worker stuck in gdb_stub_handle_stop has active=1 (paused in ptrace-stop, not deactivated), so the new join here polls 100ms, sees no deactivation, and detaches. Then gdb_stub_shutdown releases the worker, it resumes execution, and main proceeds to shim_globals_counters_dump / cleanup_main_resources -- same SIGSEGV pattern the PR is fixing, just gated on a GDB session.

Move the join one line down so the GDB-paused case is also covered:

int exit_code = vcpu_run_loop(...);

/* Tear down debugger state before freeing guest/vCPU resources. */
gdb_stub_shutdown();

thread_join_workers();

if (shim_globals_stats_enabled())
    shim_globals_counters_dump(&g);

Comment thread src/main.c
* fault on freed guest memory and crash the host with SIGSEGV, masking the
* real exit code. thread_join_workers() is a no-op once the workers have
* already wound down (the common single-threaded case). */
thread_join_workers();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sc_exit_group at syscall.c:719 already calls thread_join_workers(), so this is a second join. Two consequences worth tracking:

  1. If a worker calls exit_group, that worker's thread_join_workers snapshots all others including main (slot 0, which never thread_deactivates itself) -- so it polls 100ms then detaches main. Detaching the host process's main pthread is legal but architecturally odd.
  2. After that first call returns and main eventually runs this new one, any worker that wound down in between has active=0 -- main's call then tries pthread_join on a thread that was already joined or detached. POSIX says undefined; macOS libpthread returns EINVAL in practice (return is unchecked at thread.c:354), so not catastrophic but contract drift.

Cleaner: drop the thread_join_workers() from sc_exit_group. The exit_group caller still does proc_request_exit_group + thread_for_each(thread_force_exit_cb) which forces every sibling out of hv_vcpu_run; main observes the flag, exits the loop, and runs the single authoritative join here. Removes the detach-main wart and the double-join. Can be a follow-up -- not a blocker for this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants