Switch ppoll/pselect6 to host_fd_ref_t and tighten#72
Merged
Conversation
sys_ppoll previously walked a hand-rolled FD lookup cache to translate guest fds to host fds, with no protection against a concurrent close(2) on another thread retiring the fd between lookup and poll(). The replacement is host_fd_ref_open_io plus a per-call host_fd_ref_t array that keeps each host fd alive until poll() returns and the refs are released. sys_pselect6 follows the same pattern. For invalid guest fds, the bad-fd slot is no longer suppressed by an early return. Force a non-blocking poll() when any slot is invalid, re-stamp POLLNVAL on those entries after the call (POSIX poll resets revents to 0 for fd < 0), and credit them to the return count. Linux reports POLLNVAL on bad fds alongside revents on the good ones in the same call; the prior early-return dropped ready events on valid fds. ppoll and pselect6 also stop silently dropping a bad sigmask pointer: guest_read_small failures now return -EFAULT, and pselect6 returns -EINVAL when ss_len does not match sizeof(sigset_t). pselect6 grows a poll(2) fallback for cases where a host fd or the wakeup pipe exceeds FD_SETSIZE. select cannot represent those fds, so the fallback drives a struct pollfd array and maps poll revents back to read/write/except fd_sets (POLLIN|POLLHUP|POLLERR for read, POLLOUT|POLLHUP|POLLERR for write, POLLPRI for except). Add the Linux IP small-int sockopts to socket_opt_uses_small_int: IP_TOS, IP_TTL, IP_HDRINCL, IP_PKTINFO, IP_RECVTTL, IP_RECVTOS. macOS rejects setsockopt with optlen < sizeof(int) for these, so the host call always forwards a zero-extended int regardless of guest optlen. getsockopt mirrors Linux ip_sockglue copyval: when the caller buffer is shorter than int and the value fits in a byte, report and write a single byte. Factor ip_copyval_clamp to share the check between the cached fast path and the post-host path. Normalize SO_PASSCRED and the four IP boolean toggles (IP_HDRINCL, IP_PKTINFO, IP_RECVTTL, IP_RECVTOS) to 0 or 1 in socket_small_int_normalize so setsockopt(IP_PKTINFO, 5) caches 1 and getsockopt returns 1, matching the kernel's !!val convention. SO_PASSCRED also joins the small-int set so its cached round-trip survives the gate change from "guest_optlen <= sizeof(int)" to socket_opt_uses_small_int(level, optname). rt_sigreturn switches its rt_sigframe read to guest_read_small to pick up the bounded direct-mapping fast path; guest_read_small falls back to guest_read when the bounded mapping fails so the frame size stays correct.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
sys_ppoll previously walked a hand-rolled FD lookup cache to translate guest fds to host fds, with no protection against a concurrent close(2) on another thread retiring the fd between lookup and poll(). The replacement is host_fd_ref_open_io plus a per-call host_fd_ref_t array that keeps each host fd alive until poll() returns and the refs are released. sys_pselect6 follows the same pattern.
For invalid guest fds, the bad-fd slot is no longer suppressed by an early return. Force a non-blocking poll() when any slot is invalid, re-stamp POLLNVAL on those entries after the call (POSIX poll resets revents to 0 for fd < 0), and credit them to the return count. Linux reports POLLNVAL on bad fds alongside revents on the good ones in the same call; the prior early-return dropped ready events on valid fds.
ppoll and pselect6 also stop silently dropping a bad sigmask pointer: guest_read_small failures now return -EFAULT, and pselect6 returns -EINVAL when ss_len does not match sizeof(sigset_t).
pselect6 grows a poll(2) fallback for cases where a host fd or the wakeup pipe exceeds FD_SETSIZE. select cannot represent those fds, so the fallback drives a struct pollfd array and maps poll revents back to read/write/except fd_sets (POLLIN|POLLHUP|POLLERR for read, POLLOUT|POLLHUP|POLLERR for write, POLLPRI for except).
Add the Linux IP small-int sockopts to socket_opt_uses_small_int: IP_TOS, IP_TTL, IP_HDRINCL, IP_PKTINFO, IP_RECVTTL, IP_RECVTOS. macOS rejects setsockopt with optlen < sizeof(int) for these, so the host call always forwards a zero-extended int regardless of guest optlen. getsockopt mirrors Linux ip_sockglue copyval: when the caller buffer is shorter than int and the value fits in a byte, report and write a single byte. Factor ip_copyval_clamp to share the check between the cached fast path and the post-host path.
Normalize SO_PASSCRED and the four IP boolean toggles (IP_HDRINCL, IP_PKTINFO, IP_RECVTTL, IP_RECVTOS) to 0 or 1 in
socket_small_int_normalize so setsockopt(IP_PKTINFO, 5) caches 1 and getsockopt returns 1, matching the kernel's !!val convention. SO_PASSCRED also joins the small-int set so its cached round-trip survives the gate change from "guest_optlen <= sizeof(int)" to socket_opt_uses_small_int(level, optname).
rt_sigreturn switches its rt_sigframe read to guest_read_small to pick up the bounded direct-mapping fast path; guest_read_small falls back to guest_read when the bounded mapping fails so the frame size stays correct.
Summary by cubic
Make ppoll/pselect6 race-free and Linux-correct, and align IP and
SO_PASSCREDsockopts with Linux semantics on macOS. Adds a poll fallback for oversized FDs and tightens small-int handling.Bug Fixes
IP_TOS,IP_TTL,IP_HDRINCL,IP_PKTINFO,IP_RECVTTL,IP_RECVTOS, andSO_PASSCREDas small-int; require optlen > 0; normalize booleans to 0/1; always pass sizeof(int) to the host; clamp IP getsockopt to 1 byte when the buffer is short and the value fits; accept() inheritsSO_PASSCREDfrom the listener (including while accept blocks); only injectSCM_CREDENTIALSon AF_UNIX sockets.Refactors
host_fd_ref_tto keep host FDs alive during the call and avoid close(2) races.Written for commit d369caf. Summary will update on new commits.