spock_failover_slots: drop EXCLUSIVE locks around slot xmin compute #485
Conversation
synchronize_one_slot held ReplicationSlotControlLock LW_EXCLUSIVE and ProcArrayLock LW_EXCLUSIVE across the slot xmin seed, then later in the same function ReplicationSlotRelease() / ReplicationSlotDropPtr() would call ReplicationSlotsComputeRequiredXmin(false) which needs the SHARED ReplicationSlotControlLock. On PG17+ standbys this contention wedged the spock_failover_slots worker inside LWLockAcquire and queued every backend trying to initialise behind it. Take ProcArrayLock LW_SHARED for GetOldestSafeDecodingTransactionId, update the slot fields under the slot's SpinLock, then let ReplicationSlotsComputeRequiredXmin(false) acquire its own short SHARED RSCL the way every PG core call site does. (cherry picked from commit fb8c2de)
qport() now sets PGCONNECT_TIMEOUT=10 so a hung standby fails in 10 s instead of stalling prove until the harness kills it. Add qport_status() which returns (output, psql_rc) so callers can assert connect succeeded within the timeout, and use it in the read-only-standby regression assertions for explicit "psql rc=N" failure messages.
📝 WalkthroughWalkthroughThe PR refactors slot synchronization locking in ChangesSlot synchronization lock pattern and test coverage
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Duplication | 0 |
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/tap/t/018_failover_slots.pl`:
- Around line 439-474: The write-rejection check still invokes psql directly
without a connection timeout, so a wedged standby can hang the test; change that
write-probe to use the same qport_status(...) wrapper used for the reads (or, if
keeping the direct psql call, set PGCONNECT_TIMEOUT to the same timeout value)
so the write probe fails fast; update the code that performs the
"write-rejection" psql invocation to either call qport_status($pg_bin, $host,
$standby_port, $dbname, $db_user, "<write SQL>") or export/prepend
PGCONNECT_TIMEOUT=10 for that psql command to match the other probes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a0f282cc-6f41-4b10-80cb-dd3be1a609b3
📒 Files selected for processing (2)
src/spock_failover_slots.ctests/tap/t/018_failover_slots.pl
| # | ||
| # Use qport_status so a hung backend (initializing / futex_wait_queue) is | ||
| # reported as an explicit assertion failure with a clear message, instead | ||
| # of stalling the whole test until prove kills it. | ||
| my ($still_in_recovery, $rc_recov) = qport_status($pg_bin, $host, $standby_port, | ||
| $dbname, $db_user, "SELECT pg_is_in_recovery()"); | ||
| $still_in_recovery =~ s/\s+//g; | ||
| ok($rc_recov == 0, | ||
| "Standby answered pg_is_in_recovery() within 10s (psql rc=$rc_recov)"); | ||
| is($still_in_recovery, 't', | ||
| 'Read-only standby is still in recovery (hot_standby mode)'); | ||
|
|
||
| # 1) User-table SELECT against the standby returns the committed row. | ||
| my $val_on_standby = qport($pg_bin, $host, $standby_port, | ||
| my ($val_on_standby, $rc_user) = qport_status($pg_bin, $host, $standby_port, | ||
| $dbname, $db_user, "SELECT val FROM failover_test WHERE id = 1"); | ||
| $val_on_standby =~ s/\s+//g; | ||
| ok($rc_user == 0, | ||
| "Standby answered user SELECT within 10s (psql rc=$rc_user)"); | ||
| is($val_on_standby, 'before_failover', | ||
| 'Read-only standby returns committed user data (SELECT works)'); | ||
|
|
||
| # 2) Spock catalog SELECT against the standby — the original customer | ||
| # failure mode was that spock.* reads errored out on a recovery backend. | ||
| my $standby_node_count = qport($pg_bin, $host, $standby_port, | ||
| # 2) Spock catalog SELECT against the standby — guards against the | ||
| # failure mode where backends hang in "initializing" state on the | ||
| # standby with futex_wait_queue. An explicit timeout on the connect | ||
| # makes this a fast assertion failure, not a hang. | ||
| my ($standby_node_count, $rc_node) = qport_status($pg_bin, $host, $standby_port, | ||
| $dbname, $db_user, "SELECT count(*) FROM spock.node"); | ||
| $standby_node_count =~ s/\s+//g; | ||
| ok($rc_node == 0, | ||
| "Standby answered spock.node SELECT within 10s (psql rc=$rc_node)"); | ||
| ok(($standby_node_count =~ /^\d+$/) && $standby_node_count >= 1, | ||
| "Read-only standby returns spock.node ($standby_node_count rows)"); | ||
|
|
||
| # 3) The synced logical slot is visible on the standby. | ||
| my $standby_slot_count = qport($pg_bin, $host, $standby_port, | ||
| my ($standby_slot_count, $rc_slot) = qport_status($pg_bin, $host, $standby_port, | ||
| $dbname, $db_user, | ||
| "SELECT count(*) FROM pg_replication_slots WHERE slot_name = '$slot_name'"); | ||
| $standby_slot_count =~ s/\s+//g; | ||
| ok($rc_slot == 0, | ||
| "Standby answered pg_replication_slots SELECT within 10s (psql rc=$rc_slot)"); |
There was a problem hiding this comment.
The regression still leaves one standby probe unbounded.
These reads now fail fast, but the later write-rejection check still shells out to psql without PGCONNECT_TIMEOUT. If the standby wedges in the same initialization path, the suite can still hang after these assertions start failing. Please route that write probe through qport_status() too, or add the same timeout there.
Suggested follow-up
-my $write_rc = system(
- "$pg_bin/psql -X -h $host -p $standby_port -d $dbname -U $db_user "
- . "-v ON_ERROR_STOP=1 "
- . "-c \"INSERT INTO failover_test VALUES (999, 'must_fail')\" "
- . ">/dev/null 2>&1");
-isnt($write_rc, 0,
+my (undef, $write_rc) = qport_status($pg_bin, $host, $standby_port,
+ $dbname, $db_user,
+ "INSERT INTO failover_test VALUES (999, 'must_fail')");
+ok($write_rc != 0,
'Write against read-only standby is rejected (read-only enforced)');🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/tap/t/018_failover_slots.pl` around lines 439 - 474, The
write-rejection check still invokes psql directly without a connection timeout,
so a wedged standby can hang the test; change that write-probe to use the same
qport_status(...) wrapper used for the reads (or, if keeping the direct psql
call, set PGCONNECT_TIMEOUT to the same timeout value) so the write probe fails
fast; update the code that performs the "write-rejection" psql invocation to
either call qport_status($pg_bin, $host, $standby_port, $dbname, $db_user,
"<write SQL>") or export/prepend PGCONNECT_TIMEOUT=10 for that psql command to
match the other probes.
synchronize_one_slot held ReplicationSlotControlLock LW_EXCLUSIVE and
ProcArrayLock LW_EXCLUSIVE across the slot xmin seed, then later in the
same function ReplicationSlotRelease() / ReplicationSlotDropPtr() would
call ReplicationSlotsComputeRequiredXmin(false) which needs the SHARED
ReplicationSlotControlLock. On PG17+ standbys this contention wedged
the spock_failover_slots worker inside LWLockAcquire and queued every
backend trying to initialise behind it.
Take ProcArrayLock LW_SHARED for GetOldestSafeDecodingTransactionId,
update the slot fields under the slot's SpinLock, then let
ReplicationSlotsComputeRequiredXmin(false) acquire its own short
SHARED RSCL the way every PG core call site does.