Cpjump1 example training data access refactor by wli51 · Pull Request #29 · WayScience/virtual_stain_flow

wli51 · 2026-05-27T20:38:24Z

Change way of example CPJUMP1 data access uses existing manifest and metadata files from WayScience/JUMP-single-cell.

Addresses issue #26

Note that this PR only adds an additional 0.*.ipynb for example data download and does not yet replace the old data access notebook and subsequent training, which I decided to save for a separate PR to keep the size in check.

…ed example dataset acquisition + utilties for converting manifest as formats required by virtual stain flow datasets

…mples

review-notebook-app · 2026-05-27T20:38:29Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

MattsonCam

LGTM @wli51 , good job!

MattsonCam · 2026-05-28T20:28:30Z

+def download_wide_manifest_channels(
+    wide_manifest,
+    dest_dir,
+    channel_columns=None,
+    overwrite=False,
+):
+    """
+    Download S3 TIFFs for each channel and write a local file_index.csv with paths.
+    """


I think you can just pip install one of this repo from pypi to accomplish this instead:
https://github.com/WayScience/jump_image_data_downloader

If you decide to use this repo, then I think it would be useful to pin the version. Also, it uses download parallelization.

Alternatively, Dave recently found a repo that downloads the JUMP data and includes more datasets:
https://github.com/broadinstitute/monorepo/tree/main/libs/jump_portrait

MattsonCam · 2026-05-28T20:32:22Z

+# Split plates into train (75%) and test (25%) with seed
+train_plates, test_plates = train_test_split(
+    unique_plates, 
+    test_size=0.25, 
+    random_state=42
+)


I think this works, but you could also use hash splitting, which would ensure samples remain in their respective splits even if data is added or removed (such as with QC). Here is an example if interested:
https://github.com/WayScience/nuclear_speckles_analysis/blob/main/splitters/HashSplitter.py

MattsonCam · 2026-05-28T20:45:48Z

+
+def main(argv: Optional[list[str]] = None) -> int:
+    """
+    Command-line interface to building and ouputting the CPJUMP1 manifest.


Suggested change

Command-line interface to building and ouputting the CPJUMP1 manifest.

Command-line interface to building and outputting the CPJUMP1 manifest.

MattsonCam · 2026-05-28T20:47:34Z

+    Command-line interface to building and ouputting the CPJUMP1 manifest.
+    By default, it prints a summary and preview of the manifest. 
+    Use --output or --stdout to write the full manifest to a file or stdout.
+    May or may not be useful. 


I think I would remove this last line. Alternatively, you could explain the use case and let the user decide if it is useful to them or not

MattsonCam · 2026-05-28T20:50:22Z

+negcon_u2os_24_manifest.head()
+
+
+# ## Arrange as wide to be in anticipated format dor virtual stain flow datasets and also the format the download helper expects


Suggested change

# ## Arrange as wide to be in anticipated format dor virtual stain flow datasets and also the format the download helper expects

# ## Arrange as wide is the anticipated format in virtual stain flow datasets and also the format the download helper expects this format

Not sure if this is the intended message or not

MattsonCam · 2026-05-28T20:54:28Z

+print(f"Train samples: {len(train_manifest_wide)}, Test samples: {len(test_manifest_wide)}")
+
+
+# ## Write final splitted download manifest with metadata and download all needed data


Consider making this more concise

wli51 added 3 commits May 27, 2026 14:30

Update package dependency to support S3 access of data

e4ce43b

Add CPJUMP1 dataset manifest building functionality for more principl…

5c7e1a9

…ed example dataset acquisition + utilties for converting manifest as formats required by virtual stain flow datasets

Add script to download JUMP pilot plate data from S3 for training exa…

a0095d8

…mples

wli51 requested a review from MattsonCam May 27, 2026 21:52

MattsonCam approved these changes May 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cpjump1 example training data access refactor#29

Cpjump1 example training data access refactor#29
wli51 wants to merge 3 commits into
WayScience:mainfrom
wli51:cpjump1-data-access-refactor

wli51 commented May 27, 2026

Uh oh!

review-notebook-app Bot commented May 27, 2026

Uh oh!

MattsonCam left a comment

Uh oh!

MattsonCam May 28, 2026

Uh oh!

MattsonCam May 28, 2026

Uh oh!

MattsonCam May 28, 2026

Uh oh!

MattsonCam May 28, 2026

Uh oh!

MattsonCam May 28, 2026

Uh oh!

MattsonCam May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	Command-line interface to building and ouputting the CPJUMP1 manifest.
	Command-line interface to building and outputting the CPJUMP1 manifest.

		negcon_u2os_24_manifest.head()


		# ## Arrange as wide to be in anticipated format dor virtual stain flow datasets and also the format the download helper expects

	# ## Arrange as wide to be in anticipated format dor virtual stain flow datasets and also the format the download helper expects
	# ## Arrange as wide is the anticipated format in virtual stain flow datasets and also the format the download helper expects this format

		print(f"Train samples: {len(train_manifest_wide)}, Test samples: {len(test_manifest_wide)}")


		# ## Write final splitted download manifest with metadata and download all needed data

Conversation

wli51 commented May 27, 2026

Uh oh!

review-notebook-app Bot commented May 27, 2026

Uh oh!

MattsonCam left a comment

Choose a reason for hiding this comment

Uh oh!

MattsonCam May 28, 2026

Choose a reason for hiding this comment

Uh oh!

MattsonCam May 28, 2026

Choose a reason for hiding this comment

Uh oh!

MattsonCam May 28, 2026

Choose a reason for hiding this comment

Uh oh!

MattsonCam May 28, 2026

Choose a reason for hiding this comment

Uh oh!

MattsonCam May 28, 2026

Choose a reason for hiding this comment

Uh oh!

MattsonCam May 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants