Skip to content

ci(diffusion): remove local_dir and post process directly on cache#2182

Open
thomasdhc wants to merge 1 commit intomainfrom
donghyukc/diffusion_dataset_cache
Open

ci(diffusion): remove local_dir and post process directly on cache#2182
thomasdhc wants to merge 1 commit intomainfrom
donghyukc/diffusion_dataset_cache

Conversation

@thomasdhc
Copy link
Copy Markdown
Contributor

What does this PR do ?

  • diffusion_finetune_launcher.sh was calling snapshot_download(..., local_dir=$DATA_DIR/raw), which bypasses HF_HUB_CACHE (modernhuggingface_hub writes only to local_dir). The dataset never landed in the shared cache, so offline reruns failed with LocalEntryNotFoundError.
  • Drop local_dir, capture the resolved snapshot path, and point preprocessing at it directly. Cache fills once; subsequent offline runs hit it.

Changelog

  • Add specific line by line info of high level changes in this PR.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)

Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant