fix: retry deadlocks with exponential backoff and jitter#914
Conversation
Deadlocks (SQLSTATE 40001) under high contention were exhausting the default 2-retry limit in withTransaction. Increase to 5 retries for deadlocks specifically, with exponential backoff and random jitter to avoid thundering-herd re-collisions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
📝 WalkthroughWalkthroughAdapter::withTransaction() now distinguishes deadlocks from other retryable errors, applying a dedicated retry budget and exponential backoff with jitter via new isDeadlock() and retrySleep() helpers. New unit tests validate retry counts and startTransaction() recovery from desynchronized rollback states. ChangesDeadlock retry and backoff
Estimated code review effort: 3 (Moderate) | ~25 minutes Sequence Diagram(s)sequenceDiagram
participant Caller
participant Adapter
participant isDeadlock
participant retrySleep
Caller->>Adapter: withTransaction(callback)
Adapter->>Adapter: begin transaction, run callback
Adapter->>Adapter: rollback on error
Adapter->>isDeadlock: isDeadlock(exception)
isDeadlock-->>Adapter: true/false
Adapter->>retrySleep: retrySleep(baseSleep, attempt)
retrySleep-->>Adapter: sleep duration
Adapter->>Adapter: retry transaction or rethrow
Possibly related PRs
Suggested reviewers: 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Fix Confidence: 82/100The fix correctly identifies deadlock exceptions and provides significantly more retries with better backoff strategy. The exponential backoff with jitter is the standard approach for deadlock resolution. However, confidence is not higher because: (1) we can't fully test real concurrent deadlock scenarios in unit tests, (2) extremely high contention could still exhaust 5 retries, and (3) the CI checks haven't completed yet on the PR. |
Greptile SummaryThis PR improves deadlock handling in
Confidence Score: 3/5The production retry logic is correct, but three of the five new tests assert startTransaction() behaviour that the existing SQL adapter does not implement and will fail on CI. The production code change (withTransaction + isDeadlock + retrySleep) is logically correct. However, the test file ships three tests that verify a 'swallow desynced rollback and retry beginTransaction' behaviour that SQL::startTransaction() has never implemented — those tests will throw TransactionException rather than passing. tests/unit/SQLTransactionTest.php — the three startTransaction desynced-rollback tests need to be removed or the corresponding implementation added before merging. Important Files Changed
Reviews (1): Last reviewed commit: "fix: retry deadlocks with exponential ba..." | Re-trigger Greptile |
| public function testStartTransactionRecoversFromDesyncedRollback(): void | ||
| { | ||
| $pdo = $this->getMockBuilder(\PDO::class) | ||
| ->disableOriginalConstructor() | ||
| ->getMock(); | ||
|
|
||
| $pdo->method('inTransaction')->willReturn(true); | ||
| $pdo->method('rollBack')->willThrowException( | ||
| new PDOException('There is no active transaction') | ||
| ); | ||
| $pdo->expects($this->once()) | ||
| ->method('beginTransaction') | ||
| ->willReturn(true); | ||
|
|
||
| $adapter = new MySQL($pdo); | ||
|
|
||
| $this->assertTrue($adapter->startTransaction()); | ||
| $this->assertTrue($adapter->inTransaction()); | ||
| } | ||
|
|
||
| public function testStartTransactionDoesNotMaskBeginFailureAfterDesyncedRollback(): void | ||
| { | ||
| $pdo = $this->getMockBuilder(\PDO::class) | ||
| ->disableOriginalConstructor() | ||
| ->getMock(); | ||
|
|
||
| $pdo->method('inTransaction')->willReturn(true); | ||
| $pdo->method('rollBack')->willThrowException( | ||
| new PDOException('There is no active transaction') | ||
| ); | ||
| $pdo->expects($this->once()) | ||
| ->method('beginTransaction') | ||
| ->willThrowException(new PDOException('Connection lost')); | ||
|
|
||
| $adapter = new MySQL($pdo); | ||
|
|
||
| $this->expectException(TransactionException::class); | ||
| $this->expectExceptionMessage('Failed to start transaction: Connection lost'); | ||
|
|
||
| $adapter->startTransaction(); | ||
| } | ||
|
|
||
| public function testPostgresStartTransactionRecoversFromDesyncedRollback(): void | ||
| { | ||
| $method = new ReflectionMethod(Postgres::class, 'startTransaction'); | ||
| $this->assertSame(SQL::class, $method->getDeclaringClass()->getName()); | ||
|
|
||
| $pdo = $this->getMockBuilder(\PDO::class) | ||
| ->disableOriginalConstructor() | ||
| ->getMock(); | ||
|
|
||
| $pdo->method('inTransaction')->willReturn(true); | ||
| $pdo->method('rollBack')->willThrowException( | ||
| new PDOException('There is no active transaction') | ||
| ); | ||
| $pdo->expects($this->once()) | ||
| ->method('beginTransaction') | ||
| ->willReturn(true); | ||
|
|
||
| $adapter = new Postgres($pdo); | ||
|
|
||
| $this->assertTrue($adapter->startTransaction()); | ||
| $this->assertTrue($adapter->inTransaction()); | ||
| } |
There was a problem hiding this comment.
Three
startTransaction tests verify behaviour that doesn't exist
testStartTransactionRecoversFromDesyncedRollback, testStartTransactionDoesNotMaskBeginFailureAfterDesyncedRollback, and testPostgresStartTransactionRecoversFromDesyncedRollback all set up a scenario where PDO::rollBack() throws PDOException('There is no active transaction'), then expect startTransaction() to swallow that specific error and proceed to call beginTransaction(). But SQL::startTransaction() wraps every PDOException in a single try/catch block covering both rollBack() and beginTransaction(). When rollBack() throws, execution jumps straight to the catch and re-throws as TransactionException('Failed to start transaction: There is no active transaction') — beginTransaction() is never reached.
Concretely: the two "recover" tests will fail because startTransaction() throws instead of returning true, and testStartTransactionDoesNotMaskBeginFailureAfterDesyncedRollback will fail because the actual exception message is 'There is no active transaction', not 'Connection lost'. The implementation change needed to make these pass (selectively ignoring "no active transaction" rollback errors and retrying beginTransaction) is not present in this PR.
| protected static function isDeadlock(\Throwable $e): bool | ||
| { | ||
| if ($e instanceof \PDOException && (string)$e->getCode() === '40001') { | ||
| return true; | ||
| } | ||
|
|
||
| return \str_contains($e->getMessage(), 'Deadlock found when trying to get lock'); |
There was a problem hiding this comment.
isDeadlock message fallback matches any \Throwable
The second condition — str_contains($e->getMessage(), 'Deadlock found when trying to get lock') — applies to every \Throwable, not just PDOException. An application-level exception whose message happens to contain that string (e.g. wrapping or logging a previous deadlock) would silently receive 5 retries instead of 2. Limiting the message check to PDOException (parallel to the SQLSTATE check above it) would make the detection more precise.
| protected static function isDeadlock(\Throwable $e): bool | |
| { | |
| if ($e instanceof \PDOException && (string)$e->getCode() === '40001') { | |
| return true; | |
| } | |
| return \str_contains($e->getMessage(), 'Deadlock found when trying to get lock'); | |
| if ($e instanceof \PDOException && (string)$e->getCode() === '40001') { | |
| return true; | |
| } | |
| return $e instanceof \PDOException && \str_contains($e->getMessage(), 'Deadlock found when trying to get lock'); |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/Database/Adapter.php`:
- Around line 417-419: The retry logic in Adapter::runRetryCallback is using the
original $action to choose $maxRetries, which can wrongly apply deadlock retries
even when the rollback failure itself is not a deadlock. Update the retry budget
calculation so it is based on the rollback error state (the $rollback
exception/error) rather than $action, and keep the retry/sleep flow in this
block consistent with the existing retry helper methods like isDeadlock() and
retrySleep().
- Around line 456-462: The shared deadlock retry helper in Adapter::isDeadlock
only recognizes SQLSTATE 40001 and the message fallback, so PostgreSQL deadlocks
are missed. Update isDeadlock to also treat PDOException instances with SQLSTATE
40P01 as deadlocks, alongside the existing 40001 check and message-based match,
so the retry path catches PostgreSQL deadlock failures too.
In `@tests/unit/SQLTransactionTest.php`:
- Around line 101-164: The tests for SQL::startTransaction and
Postgres::startTransaction do not match the current implementation behavior.
Update the production methods so SQL::startTransaction() can recover from a
desynced rollback before calling beginTransaction(), and ensure Postgres
delegates to SQL::startTransaction() rather than declaring its own override. If
the implementation is intended to stay as-is, adjust the assertions in these
tests to expect the rollback PDOException path and the actual declaring class
returned by ReflectionMethod.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: cd3145f3-9095-4dc1-a2e3-a984dc2c7fe4
📒 Files selected for processing (2)
src/Database/Adapter.phptests/unit/SQLTransactionTest.php
| $maxRetries = self::isDeadlock($action) ? $deadlockRetries : $retries; | ||
| if ($attempts < $maxRetries) { | ||
| \usleep(self::retrySleep($sleep, $attempts)); |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
Base rollback-failure retries on the rollback error.
Line 417 gives the deadlock retry budget when $action is a deadlock even if $rollback is a non-deadlock cleanup failure. That can re-enter the callback after an unconfirmed rollback state.
Proposed fix
- $maxRetries = self::isDeadlock($action) ? $deadlockRetries : $retries;
+ $maxRetries = self::isDeadlock($rollback) ? $deadlockRetries : $retries;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| $maxRetries = self::isDeadlock($action) ? $deadlockRetries : $retries; | |
| if ($attempts < $maxRetries) { | |
| \usleep(self::retrySleep($sleep, $attempts)); | |
| $maxRetries = self::isDeadlock($rollback) ? $deadlockRetries : $retries; | |
| if ($attempts < $maxRetries) { | |
| \usleep(self::retrySleep($sleep, $attempts)); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/Database/Adapter.php` around lines 417 - 419, The retry logic in
Adapter::runRetryCallback is using the original $action to choose $maxRetries,
which can wrongly apply deadlock retries even when the rollback failure itself
is not a deadlock. Update the retry budget calculation so it is based on the
rollback error state (the $rollback exception/error) rather than $action, and
keep the retry/sleep flow in this block consistent with the existing retry
helper methods like isDeadlock() and retrySleep().
| protected static function isDeadlock(\Throwable $e): bool | ||
| { | ||
| if ($e instanceof \PDOException && (string)$e->getCode() === '40001') { | ||
| return true; | ||
| } | ||
|
|
||
| return \str_contains($e->getMessage(), 'Deadlock found when trying to get lock'); |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
🧩 Analysis chain
🌐 Web query:
PostgreSQL SQLSTATE code for deadlock_detected and serialization_failure
💡 Result:
In PostgreSQL, the SQLSTATE error codes for deadlock_detected and serialization_failure are as follows: - serialization_failure: 40001 [1][2][3] - deadlock_detected: 40P01 [4][1][2][3] These codes are part of Class 40 — Transaction Rollback, and applications are typically expected to retry transactions that encounter these specific errors [1][2][3].
Citations:
- 1: https://www.postgresql.org/docs/18/mvcc-serialization-failure-handling.html
- 2: https://www.postgresql.org/docs/18/errcodes-appendix.html
- 3: https://www.postgresql.org/docs/19/mvcc-serialization-failure-handling.html
- 4: https://www.postgresql.org/docs/current/mvcc-serialization-failure-handling.html
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Adapter.php outline =="
ast-grep outline src/Database/Adapter.php --view expanded || true
echo
echo "== Relevant slice =="
sed -n '420,500p' src/Database/Adapter.php
echo
echo "== isDeadlock usages =="
rg -n "isDeadlock\(" src/Database -S
echo
echo "== adapter subclasses / drivers =="
fd -t f 'Adapter.php' src/Database -x printf '%s\n' {}
fd -t f -g '*Adapter.php' src/Database | sed -n '1,80p'Repository: utopia-php/database
Length of output: 3202
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== PostgreSQL-related files =="
git ls-files | rg -n 'Postgre|pgsql|pg_'
echo
echo "== PostgreSQL adapter / driver references =="
rg -n --hidden --glob '!vendor/**' --glob '!node_modules/**' -i 'pgsql|postgres|40P01|deadlock detected|serialization failure' src test . || true
echo
echo "== Adapter subclasses =="
fd -t f 'Adapter.php' src | sed -n '1,120p'Repository: utopia-php/database
Length of output: 9686
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '380,460p' src/Database/Adapter.php
echo
echo "== SQLTransactionTest slice =="
sed -n '1,220p' tests/unit/SQLTransactionTest.phpRepository: utopia-php/database
Length of output: 8774
Add PostgreSQL deadlock SQLSTATE 40P01 to the shared retry helper.
40001 covers serialization failures; PostgreSQL deadlocks use 40P01, so those transactions still fall back to the generic 2-retry path.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/Database/Adapter.php` around lines 456 - 462, The shared deadlock retry
helper in Adapter::isDeadlock only recognizes SQLSTATE 40001 and the message
fallback, so PostgreSQL deadlocks are missed. Update isDeadlock to also treat
PDOException instances with SQLSTATE 40P01 as deadlocks, alongside the existing
40001 check and message-based match, so the retry path catches PostgreSQL
deadlock failures too.
| public function testStartTransactionRecoversFromDesyncedRollback(): void | ||
| { | ||
| $pdo = $this->getMockBuilder(\PDO::class) | ||
| ->disableOriginalConstructor() | ||
| ->getMock(); | ||
|
|
||
| $pdo->method('inTransaction')->willReturn(true); | ||
| $pdo->method('rollBack')->willThrowException( | ||
| new PDOException('There is no active transaction') | ||
| ); | ||
| $pdo->expects($this->once()) | ||
| ->method('beginTransaction') | ||
| ->willReturn(true); | ||
|
|
||
| $adapter = new MySQL($pdo); | ||
|
|
||
| $this->assertTrue($adapter->startTransaction()); | ||
| $this->assertTrue($adapter->inTransaction()); | ||
| } | ||
|
|
||
| public function testStartTransactionDoesNotMaskBeginFailureAfterDesyncedRollback(): void | ||
| { | ||
| $pdo = $this->getMockBuilder(\PDO::class) | ||
| ->disableOriginalConstructor() | ||
| ->getMock(); | ||
|
|
||
| $pdo->method('inTransaction')->willReturn(true); | ||
| $pdo->method('rollBack')->willThrowException( | ||
| new PDOException('There is no active transaction') | ||
| ); | ||
| $pdo->expects($this->once()) | ||
| ->method('beginTransaction') | ||
| ->willThrowException(new PDOException('Connection lost')); | ||
|
|
||
| $adapter = new MySQL($pdo); | ||
|
|
||
| $this->expectException(TransactionException::class); | ||
| $this->expectExceptionMessage('Failed to start transaction: Connection lost'); | ||
|
|
||
| $adapter->startTransaction(); | ||
| } | ||
|
|
||
| public function testPostgresStartTransactionRecoversFromDesyncedRollback(): void | ||
| { | ||
| $method = new ReflectionMethod(Postgres::class, 'startTransaction'); | ||
| $this->assertSame(SQL::class, $method->getDeclaringClass()->getName()); | ||
|
|
||
| $pdo = $this->getMockBuilder(\PDO::class) | ||
| ->disableOriginalConstructor() | ||
| ->getMock(); | ||
|
|
||
| $pdo->method('inTransaction')->willReturn(true); | ||
| $pdo->method('rollBack')->willThrowException( | ||
| new PDOException('There is no active transaction') | ||
| ); | ||
| $pdo->expects($this->once()) | ||
| ->method('beginTransaction') | ||
| ->willReturn(true); | ||
|
|
||
| $adapter = new Postgres($pdo); | ||
|
|
||
| $this->assertTrue($adapter->startTransaction()); | ||
| $this->assertTrue($adapter->inTransaction()); | ||
| } |
There was a problem hiding this comment.
🎯 Functional Correctness | 🔴 Critical | 🏗️ Heavy lift
Align these tests with the actual startTransaction implementation.
The supplied SQL::startTransaction() snippet wraps the rollback PDOException immediately, so Lines 117 and 140 won’t reach the expected paths. The supplied Postgres snippet also declares its own startTransaction(), so Line 146 would not resolve to SQL::class. Update the production implementations in the same PR or adjust these assertions.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/unit/SQLTransactionTest.php` around lines 101 - 164, The tests for
SQL::startTransaction and Postgres::startTransaction do not match the current
implementation behavior. Update the production methods so
SQL::startTransaction() can recover from a desynced rollback before calling
beginTransaction(), and ensure Postgres delegates to SQL::startTransaction()
rather than declaring its own override. If the implementation is intended to
stay as-is, adjust the assertions in these tests to expect the rollback
PDOException path and the actual declaring class returned by ReflectionMethod.
Summary
SQLSTATE[40001]) inwithTransactionnow get 5 retries (up from 2) with exponential backoff and random jitterisDeadlock()helper to detect deadlock exceptions by SQLSTATE code and error messageretrySleep()helper for exponential backoff with jitter to avoid thundering-herd re-collisionsFixes: https://appwrite.sentry.io/issues/7430260127/
Context
Under high contention (e.g. concurrent stats workers updating the same project), MariaDB deadlocks (
SQLSTATE[40001]: Serialization failure: 1213) were exhausting the 2-retry limit inAdapter::withTransaction(), causing unrecoverable failures. The linear backoff (50ms, 100ms) also made it likely for retries to re-collide.The fix uses exponential backoff (50ms, 100ms, 200ms, 400ms, 800ms) with random jitter (0-100% of delay) to spread retries and increase the chance of resolution.
Test plan
🤖 Generated with Claude Code
Summary by CodeRabbit