Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions src/main.c
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@

#include "runtime/forkipc.h"
#include "runtime/proctitle.h"
#include "runtime/thread.h"

#include "syscall/fuse.h"
#include "syscall/path.h"
Expand Down Expand Up @@ -510,6 +511,16 @@ int main(int argc, char **argv)
*/
int exit_code = vcpu_run_loop(vcpu, vexit, &g, verbose, timeout_sec);

/* Wait for worker vCPU threads to stop before tearing down guest memory.
* The main thread leaves the run loop as soon as it observes the
* exit_group flag, but sibling vCPU threads may still be mid-iteration in
* their own run loops (e.g. touching shim_globals). cleanup_main_resources
* unmaps the guest slab via guest_destroy, so a still-running worker would
* fault on freed guest memory and crash the host with SIGSEGV, masking the
* real exit code. thread_join_workers() is a no-op once the workers have
* already wound down (the common single-threaded case). */
thread_join_workers();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ordering vs gdb_stub_shutdown() at the next line leaves a residual race. gdb_stub_shutdown() (gdbstub.c:1435-1440) broadcasts resume_cond to release any GDB-paused workers so they can exit. A worker stuck in gdb_stub_handle_stop has active=1 (paused in ptrace-stop, not deactivated), so the new join here polls 100ms, sees no deactivation, and detaches. Then gdb_stub_shutdown releases the worker, it resumes execution, and main proceeds to shim_globals_counters_dump / cleanup_main_resources -- same SIGSEGV pattern the PR is fixing, just gated on a GDB session.

Move the join one line down so the GDB-paused case is also covered:

int exit_code = vcpu_run_loop(...);

/* Tear down debugger state before freeing guest/vCPU resources. */
gdb_stub_shutdown();

thread_join_workers();

if (shim_globals_stats_enabled())
    shim_globals_counters_dump(&g);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sc_exit_group at syscall.c:719 already calls thread_join_workers(), so this is a second join. Two consequences worth tracking:

  1. If a worker calls exit_group, that worker's thread_join_workers snapshots all others including main (slot 0, which never thread_deactivates itself) -- so it polls 100ms then detaches main. Detaching the host process's main pthread is legal but architecturally odd.
  2. After that first call returns and main eventually runs this new one, any worker that wound down in between has active=0 -- main's call then tries pthread_join on a thread that was already joined or detached. POSIX says undefined; macOS libpthread returns EINVAL in practice (return is unchecked at thread.c:354), so not catastrophic but contract drift.

Cleaner: drop the thread_join_workers() from sc_exit_group. The exit_group caller still does proc_request_exit_group + thread_for_each(thread_force_exit_cb) which forces every sibling out of hv_vcpu_run; main observes the flag, exits the loop, and runs the single authoritative join here. Removes the detach-main wart and the double-join. Can be a follow-up -- not a blocker for this PR.


/* Tear down debugger state before freeing guest/vCPU resources. */
gdb_stub_shutdown();

Expand Down
Loading