From 7570bee90c29b4e92af3334e29184a494c5b1e7e Mon Sep 17 00:00:00 2001 From: Max042004 Date: Sat, 6 Jun 2026 21:09:52 +0800 Subject: [PATCH] Join worker vCPU threads before tearing down guest memory On exit_group the main thread leaves vcpu_run_loop as soon as it observes the exit-group flag and proceeds to cleanup_main_resources(), which unmaps the guest slab via guest_destroy(). Sibling vCPU threads may still be mid-iteration in their own run loops (e.g. in shim_globals_recompute_attention, which touches guest memory). A worker that reads the slab after the main thread frees it faults at the host level and the elfuse process dies with SIGSEGV, so a guest that requested exit_group(0) is reported as exit 139. This was masked until now because workloads that exercise it (multi-threaded JVMs) crashed earlier; with the fault-delivery fix javac runs to completion and reaches the exit_group teardown, exposing the race. Have the main thread call thread_join_workers() after vcpu_run_loop() returns and before any teardown. It waits for the workers to wind down (they respond to the hv_vcpus_exit() that exit_group already issued) and is a no-op once they have. javac now exits 0. (cherry picked from commit e2f63eab9575439bd2f900954c8e095986d321b8) --- src/main.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/src/main.c b/src/main.c index ea1d17e..6c63241 100644 --- a/src/main.c +++ b/src/main.c @@ -35,6 +35,7 @@ #include "runtime/forkipc.h" #include "runtime/proctitle.h" +#include "runtime/thread.h" #include "syscall/fuse.h" #include "syscall/path.h" @@ -510,6 +511,16 @@ int main(int argc, char **argv) */ int exit_code = vcpu_run_loop(vcpu, vexit, &g, verbose, timeout_sec); + /* Wait for worker vCPU threads to stop before tearing down guest memory. + * The main thread leaves the run loop as soon as it observes the + * exit_group flag, but sibling vCPU threads may still be mid-iteration in + * their own run loops (e.g. touching shim_globals). cleanup_main_resources + * unmaps the guest slab via guest_destroy, so a still-running worker would + * fault on freed guest memory and crash the host with SIGSEGV, masking the + * real exit code. thread_join_workers() is a no-op once the workers have + * already wound down (the common single-threaded case). */ + thread_join_workers(); + /* Tear down debugger state before freeing guest/vCPU resources. */ gdb_stub_shutdown();