Fix double handoff for unloaded small messages#84
Open
breakertt wants to merge 1 commit into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix double handoff for unloaded small messages
Resolved issue #77, we move the ACK block to after
homa_add_packet + homa_rpc_handoff. The unlock is still there (still need it forhoma_rpc_acked's lock ordering), but now the skb is on the queue before the unlock window opens. Anyone who grabs the lock during the window finds data, no clearing happens. Unit test modified correspondingly.Impact on a CloudLab xl170 pair (25 GbE, Linux 6.17.8): server-side
handoff_count / requests_receivedat unloaded 64 B drops from 1.148 to 1.000 (5/5 trials, race closed). Loaded throughput across w1..w5 is unchanged within noise (Δ kops swings -4.4% to +3.5% with no consistent sign across workloads). More details below.Root cause
homa_rpc_alloc_serversetsRPC_PKTS_READYand fireshoma_rpc_handofffor the first packet of a new server RPC, before anyone has actually put the skb onmsgin.packets.homa_data_pktdrops the bucket lock to callhoma_rpc_acked()for any piggy-backed ACK. That happens beforehoma_add_packet. The unlock is mandatory,homa_rpc_ackedneeds to grab other RPCs' locks, so we can't hold this one.More details on fix measurement
Two CloudLab xl170 nodes (E5-2640 v4 @ 2.40 GHz, 20 logical cores, 25 GbE Mellanox), small-lan profile, both on Linux 6.17.8 mainline (the version the upstream README says works).
For each branch (
fix-handoff-twice-reproducefor baseline metrics;fix-handoff-twicewith the metric overlay for the fix):cloudlab/bin/config's VLAN regexinet 10\.0\.1\.->inet 10\.10\.1\.(current small-lan uses the latter).make all && cd util && make cp_node.--ports 1 --port-receivers 0 --client-max 1 --workload 64; server--ports 1 --port-threads 1. 5 x ~10 s.--ports 3 --port-receivers 3 --client-max 200 --gbps 0; server--ports 3 --port-threads 3. 5 x 30 s.homa.ko+ runcloudlab/bin/config homa <ko> nic power rpsbetween every trial. Without that, server-side state accumulates and contaminates loaded numbers (tested it; the variance is wild without per-trial reset)./proc/net/homa_metricsafter each trial, divide.The probe is one
INC_METRICcall at the top ofhoma_rpc_handoffplus au64field. ~16 LoC across 3 files + a 10-line shell helper. Seefix-handoff-twice-reproduce.For single-packet messages each RPC should see exactly one handoff, so ratio > 1 is the race signal. Loaded ratios aren't reported below; for messages larger than one MTU each packet that lands after the receiver drained the queue legitimately fires its own wake-up, so the metric stops measuring the race.
Unloaded 64 B
Race closed. Latency / throughput delta is within the per-trial jitter (~0.5 µs, ~3% kops trial-to-trial).
Loaded, cperf 25 Gbps defaults
(P50/P99 in µs unless marked.) Δ swings -4.4 to +3.5% with no consistent sign across workloads, that's noise. Within-variant variance is comparable: baseline-w1's 5 trials span 385.93-413.60 (7%), fix-w1 spans 370.52-411.32 (11%). The cross-variant deltas are smaller than that.