DAOS-18976 rebuild: migration fetch/enumerate should retry for network error#18387
Open
gnailzenh wants to merge 1 commit into
Open
DAOS-18976 rebuild: migration fetch/enumerate should retry for network error#18387gnailzenh wants to merge 1 commit into
gnailzenh wants to merge 1 commit into
Conversation
|
Ticket title is 'Aurora rebuild failing with DER_HG / DER_SHUTDOWN' |
…k error
- There's a clear asymmetry between the scan (push) side and the pull (fetch) side:
. Scan side (rebuild_objects_send_ult): Already retries ALL daos_crt_network_error()
properly handles transient network errors when pushing OID lists to pullers.
. Pull side (mrone_obj_fetch_internal): Does NOT retry network errors when fetching
data from source
. This patch makes them consistent and always retry for network error for both cases
- rebuild IV refresh can arrive out of order, make sure it doesn't revert global done flag
Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
86b2454 to
eb3f0d2
Compare
liuxuezhao
approved these changes
Jun 1, 2026
wangshilong
approved these changes
Jun 1, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
There's a clear asymmetry between the scan (push) side and the pull (fetch) side:
. Scan side (rebuild_objects_send_ult): Already retries ALL daos_crt_network_error()
properly handles transient network errors when pushing OID lists to pullers.
. Pull side (mrone_obj_fetch_internal): Does NOT retry network errors when fetching
data from source
. This patch makes them consistent and always retry for network error for both cases
rebuild IV refresh can arrive out of order, make sure it doesn't revert global done flag
Steps for the author:
After all prior steps are complete: