[core] Fix orphan files clean deleting data files during concurrent snapshot expiration#7715
[core] Fix orphan files clean deleting data files during concurrent snapshot expiration#7715heye1005 wants to merge 1 commit intoapache:masterfrom
Conversation
|
Hi @heye1005 It is better to fix it reading full ref files for latest snapshot. |
a91ce0d to
5d525d4
Compare
Thanks for the review! Updated the PR based on your suggestion. The root cause here is that This PR has two changes:
This doesn't 100% eliminate the race — if expiration happens to delete the latest snapshot's manifest-list during the pre-deletion check, it would still fail. But that window is just milliseconds, so practically very unlikely. To fully solve this, I think we'd need either some form of locking between orphan clean and expiration, or delay manifest-list deletion (similar to how orphan clean already requires data files to be older than |
… snapshot expiration Use latestSnapshot() + ID range to read all snapshots instead of safelyGetAllSnapshots() which lists then reads, leaving a race window. This closes apache#7710.
5d525d4 to
1d66c8e
Compare
Purpose
Fix #7710.
LocalOrphanFilesClean.clean()collects used files by reading all snapshots viasafelyGetAllSnapshots(). However, this method silently catchesFileNotFoundException— if concurrent snapshot expiration deletes all snapshots between listing and reading,usedFilesends up empty, and all candidate data files get deleted as orphans.This patch skips orphan file deletion when
usedFilesis empty to prevent accidental data loss.Tests
Added
LocalOrphanFilesCleanTest#testSkipCleaningWhenAllSnapshotsDeleted.