|
| 1 | +# Scientific Programming Project: Neuroimaging Data Standards |
| 2 | + |
| 3 | +- Project name: `scientific_programming_neuroimaging` |
| 4 | +- Research question: __In NSD subject 1, session 1, is the mean beta response different in V1 and hV4?__ |
| 5 | +- Optional extension: __If the core pipeline works, repeat the same ROI comparison for a few additional sessions or revisit the V1 session-drift question.__ |
| 6 | +- Programming language: `R` suggested for the class version (`RNifti`, `dplyr`, `tidyr`, `ggplot2`). Python remains optional for students who already know `nibabel` or `nilearn`. |
| 7 | +- Expert contact: TBD, Ben Harvey? |
| 8 | + |
| 9 | +> **Canonical course conventions live in [project_guidelines.md](../project_guidelines.md).** That file is the source of truth for the four required workflow files (`week1_explore.qmd`, `week2_operationalize_clean.qmd`, `week3_model.qmd`, `week4_storytelling.qmd`), the `data/model_data.rds` -> `data/model_results.rds` pipeline, the raw-data policy, quality-check requirements, decision logs, and contribution tracking. Read it before starting and treat anything below as project-specific guidance on top of those conventions. |
| 10 | +
|
| 11 | + |
| 12 | + |
| 13 | +*Posterior view of an NSD visual ROI mask: grey points are valid brain voxels outside this visual ROI set; colored points show V1, V2, V3, and hV4 subdivisions. The practical task is to connect voxel coordinates, ROI labels, trials, and beta values into one small analysis table.* |
| 14 | + |
| 15 | +## Tutorial framing |
| 16 | + |
| 17 | +This project is about scientific data standards and binary scientific arrays. The |
| 18 | +raw object is not one tidy table. It is a small set of files that only make sense |
| 19 | +together: BIDS metadata, JSON sidecars, event TSV files, NIfTI beta maps, ROI |
| 20 | +masks, and a dataset manual. |
| 21 | + |
| 22 | +The project should stay deliberately small. Students should not try to do a full |
| 23 | +fMRI study, fit a complex neural encoding model, download all NSD sessions, or |
| 24 | +solve image-category labeling. The class version uses one subject, one session, |
| 25 | +one single-trial beta file, and one visual ROI mask. |
| 26 | + |
| 27 | +Students should learn three things: |
| 28 | + |
| 29 | +1. How a scientific repository uses standards and metadata to make a large |
| 30 | + dataset understandable. |
| 31 | +2. How a NIfTI beta file and a NIfTI ROI mask encode different but aligned |
| 32 | + arrays. |
| 33 | +3. How to reduce voxelwise trial data to a small trial-by-ROI table that answers |
| 34 | + one simple question. |
| 35 | + |
| 36 | +The core research question is: |
| 37 | + |
| 38 | +> Are mean trial-level beta responses different in V1 and hV4 for one NSD |
| 39 | +> subject-session? |
| 40 | +
|
| 41 | +This is not meant to be a new neuroscience contribution. It is a defensible |
| 42 | +mini-question that forces students to work with real neuroimaging files without |
| 43 | +drowning in the full NSD dataset. |
| 44 | + |
| 45 | +## Peer-teaching checklist |
| 46 | + |
| 47 | +| Dimension | This project teaches | |
| 48 | +|---|---| |
| 49 | +| Data structure | Participant/session folders, event metadata, 4D beta images, 3D ROI masks, voxel coordinates, trial index, and ROI labels. | |
| 50 | +| Storage system | Scientific repository on AWS organized through BIDS-style and NSD-specific conventions. | |
| 51 | +| File formats | NIfTI `.nii.gz`, TSV, JSON, CSV, MATLAB design files if needed, and RDS/CSV outputs created by students. | |
| 52 | +| Encoding | Text metadata, JSON sidecars, tabular event files, binary scientific image arrays, and integer-coded ROI masks. | |
| 53 | +| Model | A paired trial-level ROI comparison, such as a one-sample test or bootstrap interval for `hV4 - V1` trial differences. | |
| 54 | +| Key aspects to explain | Data standards, provenance, voxel-to-ROI mapping, NIfTI dimensions, trial indexing, ROI aggregation, file sizes, and what is lost when voxel maps become ROI means. | |
| 55 | + |
| 56 | +## Resources |
| 57 | +### Data source |
| 58 | + |
| 59 | +The practical uses the **Natural Scenes Dataset (NSD)**. European fMRI datasets |
| 60 | +are difficult to share publicly because detailed brain images are often treated |
| 61 | +as individually identifiable under the GDPR. NSD is an American dataset that can |
| 62 | +be shared through AWS Open Data after signing the NSD data access agreement. |
| 63 | + |
| 64 | +- Dataset and documentation: https://naturalscenesdataset.org/ |
| 65 | +- Main reference paper: Allen et al. (2022), Nature Neuroscience. https://doi.org/10.1038/s41593-021-00962-x |
| 66 | +- Optional extension reference for session drift: https://doi.org/10.1038/s41467-023-40144-w |
| 67 | + |
| 68 | +Minimum NSD files for the class version: |
| 69 | + |
| 70 | +- BIDS metadata: |
| 71 | + `nsddata_rawdata/dataset_description.json`, |
| 72 | + `nsddata_rawdata/participants.tsv`, and |
| 73 | + `nsddata_rawdata/task-nsdcore_bold.json`. |
| 74 | +- One example event file: |
| 75 | + `nsddata_rawdata/sub-01/ses-nsd01/func/sub-01_ses-nsd01_task-nsdcore_run-01_events.tsv`. |
| 76 | +- Subject 1 visual ROI mask: |
| 77 | + `nsddata/ppdata/subj01/func1pt8mm/roi/prf-visualrois.nii.gz`. |
| 78 | +- One single-trial beta file: |
| 79 | + `nsddata_betas/ppdata/subj01/func1pt8mm/betas_fithrf_GLMdenoise_RR/betas_session01.nii.gz`. |
| 80 | +- Stimulus metadata only for inspection, not for the core model: |
| 81 | + `nsddata/experiments/nsd/nsd_stim_info_merged.csv`. |
| 82 | +- Optional image examples: |
| 83 | + a few files from `nsddata_stimuli/stimuli/nsd/shared1000/`. |
| 84 | + |
| 85 | +Students should not download all raw BOLD volumes, all subjects, all sessions of |
| 86 | +single-trial betas, or the full 37 GB `nsd_stimuli.hdf5` file. If laptop storage |
| 87 | +or memory is a problem, the instructor can provide a pre-cropped subset or let |
| 88 | +students use `meanbeta_session01.nii.gz` for Week 1 exploration only. The main |
| 89 | +Week 2 table should still be built from a documented NIfTI beta file plus the ROI |
| 90 | +mask. |
| 91 | + |
| 92 | +### ROI codes |
| 93 | + |
| 94 | +For the core question, combine: |
| 95 | + |
| 96 | +- V1 = ROI codes `1` and `2` (`V1v`, `V1d`) |
| 97 | +- hV4 = ROI code `7` |
| 98 | + |
| 99 | +Students do not need to analyze every ROI. They should understand that the ROI |
| 100 | +mask is an integer-coded spatial map whose dimensions must align with the beta |
| 101 | +file's spatial dimensions. |
| 102 | + |
| 103 | +### Knowledge sources |
| 104 | + |
| 105 | +- BIDS documentation for neuroimaging data organization and metadata. |
| 106 | +- Basic introductions to NIfTI, JSON sidecars, events files, and participant |
| 107 | + metadata. |
| 108 | +- NSD documentation, the main paper, and the dataset manual. |
| 109 | +- R packages: `RNifti`, `dplyr`, `tidyr`, `ggplot2`, `readr`. |
| 110 | +- Optional Python equivalents: `nibabel`, `numpy`, `pandas`, `nilearn`. |
| 111 | + |
| 112 | +## Week-by-week |
| 113 | +### Week 1 |
| 114 | + |
| 115 | +Start from the raw scientific repository, identify the files that belong |
| 116 | +together, and make a written manifest before downloading large files. |
| 117 | + |
| 118 | +Week 1 exact data checklist: |
| 119 | + |
| 120 | +- Read and accept the NSD data terms. |
| 121 | +- Download only the BIDS metadata files listed above. |
| 122 | +- Download one event TSV for `sub-01`, `ses-nsd01`, `run-01`. |
| 123 | +- Download the subject-1 visual ROI mask. |
| 124 | +- Download one selected beta file, preferably `betas_session01.nii.gz`. |
| 125 | +- Optionally download a few `shared1000` images so the group can see what kind |
| 126 | + of stimuli were used. |
| 127 | +- Save a local manifest with file paths, file sizes, and the reason each file is |
| 128 | + needed. |
| 129 | +- In R, load only the headers first and write down the dimensions of the ROI mask |
| 130 | + and beta file. |
| 131 | + |
| 132 | +Skip in Week 1: |
| 133 | + |
| 134 | +- all raw BOLD fMRI volumes; |
| 135 | +- all subjects; |
| 136 | +- all beta sessions; |
| 137 | +- the full `nsd_stimuli.hdf5`; |
| 138 | +- animate/inanimate labeling; |
| 139 | +- session-drift modeling. |
| 140 | + |
| 141 | +Week 1 questions: |
| 142 | + |
| 143 | +- What is BIDS, what is NIfTI, and what is a JSON sidecar? |
| 144 | +- What is a voxel? |
| 145 | +- What does each dimension of the beta file represent? |
| 146 | +- What is an ROI mask, and why are most voxels outside this visual ROI set? |
| 147 | +- Which files are raw measurements, which are metadata, and which are derived |
| 148 | + analysis files? |
| 149 | +- Why is the data-use agreement part of the scientific data structure? |
| 150 | + |
| 151 | +Prepare for roundtable in week 2: |
| 152 | + |
| 153 | +- Explain why scientific data standards exist. |
| 154 | +- Explain the difference between an event TSV, a beta NIfTI file, and an ROI |
| 155 | + mask. |
| 156 | +- Explain why a binary array is not self-explanatory without metadata. |
| 157 | +- Explain what can go wrong if the beta file and ROI mask dimensions do not |
| 158 | + match. |
| 159 | + |
| 160 | +### Week 2 |
| 161 | + |
| 162 | +Build the smallest useful analysis table from the raw files. |
| 163 | + |
| 164 | +- Load the ROI mask with `RNifti::readNifti()`. |
| 165 | +- Load the beta file with `RNifti::readNifti()`. |
| 166 | +- Confirm that the first three beta dimensions match the ROI mask dimensions. |
| 167 | +- Create a voxel table only for V1 and hV4 voxels. |
| 168 | +- For each trial, compute the mean beta in V1 and the mean beta in hV4. |
| 169 | +- Save `data/model_data.rds` with columns such as: |
| 170 | + `subject`, `session`, `trial`, `roi`, `n_voxels`, and `mean_beta`. |
| 171 | + |
| 172 | +The Week 2 output should be small. Students should not save a copy of the full |
| 173 | +NIfTI array as an RDS file. |
| 174 | + |
| 175 | +Prepare for roundtable in week 3: |
| 176 | + |
| 177 | +- Explain how voxelwise maps became a trial-by-ROI table. |
| 178 | +- Explain what was gained and lost by averaging over voxels. |
| 179 | +- Explain why V1 was built from `V1v` and `V1d`. |
| 180 | +- Explain one quality check: dimensions match, non-empty ROIs, plausible number |
| 181 | + of trials, or no all-missing beta summaries. |
| 182 | + |
| 183 | +### Week 3 |
| 184 | + |
| 185 | +Fit a very small model on the Week 2 table. |
| 186 | + |
| 187 | +Recommended analysis: |
| 188 | + |
| 189 | +```r |
| 190 | +wide <- tidyr::pivot_wider( |
| 191 | + model_data, |
| 192 | + names_from = roi, |
| 193 | + values_from = mean_beta |
| 194 | +) |
| 195 | + |
| 196 | +wide$hV4_minus_V1 <- wide$hV4 - wide$V1 |
| 197 | +t.test(wide$hV4_minus_V1) |
| 198 | +``` |
| 199 | + |
| 200 | +Equivalent model: |
| 201 | + |
| 202 | +```r |
| 203 | +lm(hV4_minus_V1 ~ 1, data = wide) |
| 204 | +``` |
| 205 | + |
| 206 | +The intercept is the average trial-level difference between hV4 and V1. This is |
| 207 | +simple enough to explain and still depends on the real scientific-programming |
| 208 | +work: students had to read NIfTI files, decode the ROI mask, align dimensions, |
| 209 | +and aggregate a 4D array into a table. |
| 210 | + |
| 211 | +Sensitivity check: |
| 212 | + |
| 213 | +- Repeat the comparison with `V1v` and `V1d` separately, or |
| 214 | +- repeat after removing trials with extreme beta values, or |
| 215 | +- repeat with `meanbeta_session01.nii.gz` as a descriptive check only. |
| 216 | + |
| 217 | +Prepare for roundtable in week 4: |
| 218 | + |
| 219 | +- Explain which parameter answers the research question. |
| 220 | +- Explain why the model is small but the data processing was not trivial. |
| 221 | +- Explain why trial-level values are not the same as raw BOLD time series. |
| 222 | +- Explain why session drift and image-category questions are extensions, not the |
| 223 | + core project. |
| 224 | + |
| 225 | +### Week 4 |
| 226 | + |
| 227 | +Visualize and tell a story about the raw-data-to-table pipeline. |
| 228 | + |
| 229 | +- Show the ROI mask image or 3D ROI plot. |
| 230 | +- Show the distribution of trial-level mean beta values for V1 and hV4. |
| 231 | +- Show paired trial differences or a confidence interval for `hV4 - V1`. |
| 232 | +- Show the local file manifest and the final `model_data.rds` structure. |
| 233 | +- Make the limitations explicit: one subject, one session, two ROIs, ROI means |
| 234 | + rather than voxelwise modeling, and no claim about all visual cortex or all |
| 235 | + people. |
| 236 | + |
| 237 | +The final story should make a course-level argument: |
| 238 | + |
| 239 | +> Scientific programming is not just fitting a model. It is knowing how raw |
| 240 | +> domain files, metadata, binary arrays, ROI labels, and data-use agreements |
| 241 | +> become a defensible analysis table. |
0 commit comments