Fix Skip File Validation When Offline with Existing Datasets#772
Fix Skip File Validation When Offline with Existing Datasets#772shaohuasong-fang wants to merge 1 commit into
Conversation
When internet access is unavailable and a dataset has already been downloaded locally, the log output display it's still attempts file validation against remote servers, causing unnecessary failures.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: shaohuasong-fang The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@shaohuasong-fang Thanks for the PR. I don’t think we should merge this approach. The current file-size validation is intentional. If a dataset was downloaded from the official remote source, we should keep validating that the local file still matches the remote metadata. Skipping validation whenever a local file exists can accept a corrupted or half-downloaded file, for example if a previous download was interrupted. I agree there is a valid offline/custom-dataset use case, but it should be explicit rather than changing the default behavior for official datasets. |
|
@XuanYang-cn |
Description
When internet access is unavailable and a dataset has already been downloaded locally, the log output display it's still attempts file validation against remote servers, causing unnecessary failures.
Finished display
Successfully bypassed Alibaba Cloud (OSS) and Amazon S3 file size validation checks when corresponding datasets are already available locally.