Skip to content

Fix Skip File Validation When Offline with Existing Datasets#772

Open
shaohuasong-fang wants to merge 1 commit into
zilliztech:mainfrom
shaohuasong-fang:patch-2
Open

Fix Skip File Validation When Offline with Existing Datasets#772
shaohuasong-fang wants to merge 1 commit into
zilliztech:mainfrom
shaohuasong-fang:patch-2

Conversation

@shaohuasong-fang
Copy link
Copy Markdown
Contributor

@shaohuasong-fang shaohuasong-fang commented May 1, 2026

Description

When internet access is unavailable and a dataset has already been downloaded locally, the log output display it's still attempts file validation against remote servers, causing unnecessary failures.

Finished display

Successfully bypassed Alibaba Cloud (OSS) and Amazon S3 file size validation checks when corresponding datasets are already available locally.

When internet access is unavailable and a dataset has already been downloaded locally, the log output display it's still attempts file validation against remote servers, causing unnecessary failures.
@sre-ci-robot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: shaohuasong-fang
To complete the pull request process, please assign xuanyang-cn after the PR has been reviewed.
You can assign the PR to them by writing /assign @xuanyang-cn in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@XuanYang-cn
Copy link
Copy Markdown
Collaborator

@shaohuasong-fang Thanks for the PR. I don’t think we should merge this approach.

The current file-size validation is intentional. If a dataset was downloaded from the official remote source, we should keep validating that the local file still matches the remote metadata.

Skipping validation whenever a local file exists can accept a corrupted or half-downloaded file, for example if a previous download was interrupted.

I agree there is a valid offline/custom-dataset use case, but it should be explicit rather than changing the default behavior for official datasets.

@shaohuasong-fang
Copy link
Copy Markdown
Contributor Author

@XuanYang-cn
Get it,thanks your explaining !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants