ci: add retry/backoff to GHCR docker pull in integration-test workflow (PLT-753)#3622
ci: add retry/backoff to GHCR docker pull in integration-test workflow (PLT-753)#3622amir-deris wants to merge 2 commits into
Conversation
PR SummaryLow Risk Overview Image tagging to Reviewed by Cursor Bugbot for commit 5a8fe49. Bugbot is set up for automated code reviews on this repo. Configure here. |
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3622 +/- ##
==========================================
- Coverage 58.65% 58.12% -0.54%
==========================================
Files 2225 2150 -75
Lines 183467 174156 -9311
==========================================
- Hits 107606 101221 -6385
+ Misses 66144 63945 -2199
+ Partials 9717 8990 -727
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
Problem
The
Integration Testmatrix jobs intermittently fail at the "Load prebuilt seid and pull Docker images" step with transient GHCR errors, before any test runs:Root cause
PR #3582 switched image distribution from a 1 GB artifact download to GHCR
docker pull. That pull step used a baredocker pullwith no retry wrapper. The docker client only retries layer blob downloads automatically — it does NOT retry the initial auth-token fetch / manifest HEAD request, which is exactly where the failures occur. When ~40 matrix jobs start simultaneously and hammerghcr.io/token, a briefly-slow auth response times out,docker pullexits 1, and with no retry loop the whole step (and job) fails.Fix
Wrap the pulls in a retry-with-backoff loop so the token/manifest request is also retried (5 attempts, linear backoff 5/10/15/20s):
Tagging logic is unchanged.
References