-
Notifications
You must be signed in to change notification settings - Fork 55
fix: retry deadlocks with exponential backoff and jitter #914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -402,8 +402,9 @@ public function withTransaction(callable $callback): mixed | |||||||||||||||||||||||||
| { | ||||||||||||||||||||||||||
| $sleep = 50_000; // 50 milliseconds | ||||||||||||||||||||||||||
| $retries = 2; | ||||||||||||||||||||||||||
| $deadlockRetries = 5; | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| for ($attempts = 0; $attempts <= $retries; $attempts++) { | ||||||||||||||||||||||||||
| for ($attempts = 0; $attempts <= $deadlockRetries; $attempts++) { | ||||||||||||||||||||||||||
| try { | ||||||||||||||||||||||||||
| $this->startTransaction(); | ||||||||||||||||||||||||||
| $result = $callback(); | ||||||||||||||||||||||||||
|
|
@@ -413,8 +414,9 @@ public function withTransaction(callable $callback): mixed | |||||||||||||||||||||||||
| try { | ||||||||||||||||||||||||||
| $this->rollbackTransaction(); | ||||||||||||||||||||||||||
| } catch (\Throwable $rollback) { | ||||||||||||||||||||||||||
| if ($attempts < $retries) { | ||||||||||||||||||||||||||
| \usleep($sleep * ($attempts + 1)); | ||||||||||||||||||||||||||
| $maxRetries = self::isDeadlock($action) ? $deadlockRetries : $retries; | ||||||||||||||||||||||||||
| if ($attempts < $maxRetries) { | ||||||||||||||||||||||||||
| \usleep(self::retrySleep($sleep, $attempts)); | ||||||||||||||||||||||||||
| continue; | ||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
|
|
@@ -434,8 +436,9 @@ public function withTransaction(callable $callback): mixed | |||||||||||||||||||||||||
| throw $action; | ||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| if ($attempts < $retries) { | ||||||||||||||||||||||||||
| \usleep($sleep * ($attempts + 1)); | ||||||||||||||||||||||||||
| $maxRetries = self::isDeadlock($action) ? $deadlockRetries : $retries; | ||||||||||||||||||||||||||
| if ($attempts < $maxRetries) { | ||||||||||||||||||||||||||
| \usleep(self::retrySleep($sleep, $attempts)); | ||||||||||||||||||||||||||
| continue; | ||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
|
|
@@ -447,6 +450,27 @@ public function withTransaction(callable $callback): mixed | |||||||||||||||||||||||||
| throw new TransactionException('Failed to execute transaction'); | ||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| /** | ||||||||||||||||||||||||||
| * Check if a throwable is a database deadlock error. | ||||||||||||||||||||||||||
| */ | ||||||||||||||||||||||||||
| protected static function isDeadlock(\Throwable $e): bool | ||||||||||||||||||||||||||
| { | ||||||||||||||||||||||||||
| if ($e instanceof \PDOException && (string)$e->getCode() === '40001') { | ||||||||||||||||||||||||||
| return true; | ||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| return \str_contains($e->getMessage(), 'Deadlock found when trying to get lock'); | ||||||||||||||||||||||||||
|
Comment on lines
+456
to
+462
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The second condition —
Suggested change
Comment on lines
+456
to
+462
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win 🧩 Analysis chain🌐 Web query:
💡 Result: In PostgreSQL, the SQLSTATE error codes for deadlock_detected and serialization_failure are as follows: - serialization_failure: 40001 [1][2][3] - deadlock_detected: 40P01 [4][1][2][3] These codes are part of Class 40 — Transaction Rollback, and applications are typically expected to retry transactions that encounter these specific errors [1][2][3]. Citations:
🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "== Adapter.php outline =="
ast-grep outline src/Database/Adapter.php --view expanded || true
echo
echo "== Relevant slice =="
sed -n '420,500p' src/Database/Adapter.php
echo
echo "== isDeadlock usages =="
rg -n "isDeadlock\(" src/Database -S
echo
echo "== adapter subclasses / drivers =="
fd -t f 'Adapter.php' src/Database -x printf '%s\n' {}
fd -t f -g '*Adapter.php' src/Database | sed -n '1,80p'Repository: utopia-php/database Length of output: 3202 🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "== PostgreSQL-related files =="
git ls-files | rg -n 'Postgre|pgsql|pg_'
echo
echo "== PostgreSQL adapter / driver references =="
rg -n --hidden --glob '!vendor/**' --glob '!node_modules/**' -i 'pgsql|postgres|40P01|deadlock detected|serialization failure' src test . || true
echo
echo "== Adapter subclasses =="
fd -t f 'Adapter.php' src | sed -n '1,120p'Repository: utopia-php/database Length of output: 9686 🏁 Script executed: #!/bin/bash
set -euo pipefail
sed -n '380,460p' src/Database/Adapter.php
echo
echo "== SQLTransactionTest slice =="
sed -n '1,220p' tests/unit/SQLTransactionTest.phpRepository: utopia-php/database Length of output: 8774 Add PostgreSQL deadlock SQLSTATE 40P01 to the shared retry helper. 🤖 Prompt for AI Agents |
||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| /** | ||||||||||||||||||||||||||
| * Calculate retry sleep with exponential backoff and jitter. | ||||||||||||||||||||||||||
| */ | ||||||||||||||||||||||||||
| protected static function retrySleep(int $baseSleep, int $attempt): int | ||||||||||||||||||||||||||
| { | ||||||||||||||||||||||||||
| $delay = $baseSleep * (2 ** $attempt); | ||||||||||||||||||||||||||
| return $delay + \random_int(0, $delay); | ||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| /** | ||||||||||||||||||||||||||
| * Apply a transformation to a query before an event occurs | ||||||||||||||||||||||||||
| * | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,165 @@ | ||
| <?php | ||
|
|
||
| namespace Tests\Unit; | ||
|
|
||
| use PDOException; | ||
| use PHPUnit\Framework\TestCase; | ||
| use ReflectionMethod; | ||
| use Utopia\Database\Adapter\MySQL; | ||
| use Utopia\Database\Adapter\Postgres; | ||
| use Utopia\Database\Adapter\SQL; | ||
| use Utopia\Database\Exception\Transaction as TransactionException; | ||
|
|
||
| final class SQLTransactionTest extends TestCase | ||
| { | ||
| /** | ||
| * A deadlock (SQLSTATE 40001) inside withTransaction should be retried | ||
| * more aggressively than a generic exception (which gets only 2 retries). | ||
| */ | ||
| public function testWithTransactionRetriesDeadlockMoreThanGenericError(): void | ||
| { | ||
| $pdo = $this->getMockBuilder(\PDO::class) | ||
| ->disableOriginalConstructor() | ||
| ->getMock(); | ||
|
|
||
| $pdo->method('inTransaction')->willReturn(true); | ||
| $pdo->method('beginTransaction')->willReturn(true); | ||
| $pdo->method('rollBack')->willReturn(true); | ||
|
|
||
| $adapter = new MySQL($pdo); | ||
|
|
||
| // First, verify a generic RuntimeException gets only 3 attempts (initial + 2 retries) | ||
| $genericAttempts = 0; | ||
| try { | ||
| $adapter->withTransaction(function () use (&$genericAttempts) { | ||
| $genericAttempts++; | ||
| throw new \RuntimeException('Generic failure'); | ||
| }); | ||
| } catch (\RuntimeException) { | ||
| } | ||
| $this->assertEquals(3, $genericAttempts, 'Generic exception should get 3 attempts (initial + 2 retries)'); | ||
|
|
||
| // Now verify a deadlock PDOException gets more attempts | ||
| $deadlockAttempts = 0; | ||
| try { | ||
| $adapter->withTransaction(function () use (&$deadlockAttempts) { | ||
| $deadlockAttempts++; | ||
| throw new PDOException( | ||
| 'SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction', | ||
| '40001' | ||
| ); | ||
| }); | ||
| } catch (PDOException) { | ||
| } | ||
| $this->assertGreaterThan( | ||
| $genericAttempts, | ||
| $deadlockAttempts, | ||
| 'Deadlock should get more retry attempts than a generic exception' | ||
| ); | ||
| $this->assertEquals(6, $deadlockAttempts, 'Deadlock should get 6 attempts (initial + 5 retries)'); | ||
| } | ||
|
|
||
| /** | ||
| * When a deadlock resolves on a retry, the transaction should succeed. | ||
| */ | ||
| public function testWithTransactionSucceedsAfterTransientDeadlock(): void | ||
| { | ||
| $pdo = $this->getMockBuilder(\PDO::class) | ||
| ->disableOriginalConstructor() | ||
| ->getMock(); | ||
|
|
||
| $pdo->method('inTransaction')->willReturn(true); | ||
| $pdo->method('beginTransaction')->willReturn(true); | ||
| $pdo->method('rollBack')->willReturn(true); | ||
| $pdo->method('commit')->willReturn(true); | ||
|
|
||
| $adapter = new MySQL($pdo); | ||
|
|
||
| $attempts = 0; | ||
| $result = $adapter->withTransaction(function () use (&$attempts) { | ||
| $attempts++; | ||
| if ($attempts < 3) { | ||
| throw new PDOException( | ||
| 'SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction', | ||
| '40001' | ||
| ); | ||
| } | ||
| return 'success'; | ||
| }); | ||
|
|
||
| $this->assertEquals('success', $result); | ||
| $this->assertEquals(3, $attempts, 'Should succeed on the 3rd attempt after 2 deadlocks'); | ||
| } | ||
|
|
||
| /** | ||
| * A pooled connection (e.g. Swoole PDOProxy) can keep its own transaction | ||
| * counter that survives a reconnect, so it reports an open transaction the | ||
| * underlying connection no longer holds. The cleanup rollBack() then throws | ||
| * "There is no active transaction". startTransaction() must swallow that and | ||
| * still begin a fresh transaction instead of surfacing it as a failure. | ||
| */ | ||
| public function testStartTransactionRecoversFromDesyncedRollback(): void | ||
| { | ||
| $pdo = $this->getMockBuilder(\PDO::class) | ||
| ->disableOriginalConstructor() | ||
| ->getMock(); | ||
|
|
||
| $pdo->method('inTransaction')->willReturn(true); | ||
| $pdo->method('rollBack')->willThrowException( | ||
| new PDOException('There is no active transaction') | ||
| ); | ||
| $pdo->expects($this->once()) | ||
| ->method('beginTransaction') | ||
| ->willReturn(true); | ||
|
|
||
| $adapter = new MySQL($pdo); | ||
|
|
||
| $this->assertTrue($adapter->startTransaction()); | ||
| $this->assertTrue($adapter->inTransaction()); | ||
| } | ||
|
|
||
| public function testStartTransactionDoesNotMaskBeginFailureAfterDesyncedRollback(): void | ||
| { | ||
| $pdo = $this->getMockBuilder(\PDO::class) | ||
| ->disableOriginalConstructor() | ||
| ->getMock(); | ||
|
|
||
| $pdo->method('inTransaction')->willReturn(true); | ||
| $pdo->method('rollBack')->willThrowException( | ||
| new PDOException('There is no active transaction') | ||
| ); | ||
| $pdo->expects($this->once()) | ||
| ->method('beginTransaction') | ||
| ->willThrowException(new PDOException('Connection lost')); | ||
|
|
||
| $adapter = new MySQL($pdo); | ||
|
|
||
| $this->expectException(TransactionException::class); | ||
| $this->expectExceptionMessage('Failed to start transaction: Connection lost'); | ||
|
|
||
| $adapter->startTransaction(); | ||
| } | ||
|
|
||
| public function testPostgresStartTransactionRecoversFromDesyncedRollback(): void | ||
| { | ||
| $method = new ReflectionMethod(Postgres::class, 'startTransaction'); | ||
| $this->assertSame(SQL::class, $method->getDeclaringClass()->getName()); | ||
|
|
||
| $pdo = $this->getMockBuilder(\PDO::class) | ||
| ->disableOriginalConstructor() | ||
| ->getMock(); | ||
|
|
||
| $pdo->method('inTransaction')->willReturn(true); | ||
| $pdo->method('rollBack')->willThrowException( | ||
| new PDOException('There is no active transaction') | ||
| ); | ||
| $pdo->expects($this->once()) | ||
| ->method('beginTransaction') | ||
| ->willReturn(true); | ||
|
|
||
| $adapter = new Postgres($pdo); | ||
|
|
||
| $this->assertTrue($adapter->startTransaction()); | ||
| $this->assertTrue($adapter->inTransaction()); | ||
| } | ||
|
Comment on lines
+101
to
+164
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Concretely: the two "recover" tests will fail because
Comment on lines
+101
to
+164
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🎯 Functional Correctness | 🔴 Critical | 🏗️ Heavy lift Align these tests with the actual startTransaction implementation. The supplied 🤖 Prompt for AI Agents |
||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
Base rollback-failure retries on the rollback error.
Line 417 gives the deadlock retry budget when
$actionis a deadlock even if$rollbackis a non-deadlock cleanup failure. That can re-enter the callback after an unconfirmed rollback state.Proposed fix
📝 Committable suggestion
🤖 Prompt for AI Agents