Skip to content

Restructuring proposal: top-level layout + workflow/paper/scratch division#188

Closed
cailmdaley wants to merge 31 commits into
developfrom
restructuring/proposal
Closed

Restructuring proposal: top-level layout + workflow/paper/scratch division#188
cailmdaley wants to merge 31 commits into
developfrom
restructuring/proposal

Conversation

@cailmdaley
Copy link
Copy Markdown
Collaborator

@cailmdaley cailmdaley commented Jun 2, 2026

Restructuring sp_validation

Status — 2026-06-05 · autonomous WIP checkpoint (draft, not yet for review).
This PR is now proposal and early implementation. Cail's call was: don't wait for
Sacha's foundation merge — fold his branch into our cleanup branch and put back-pressure
guards in place before moving any file. This checkpoint does Phase 0–1:

  • Foundation folded. sachaguer:develop (PR Merge Sacha's fork with fiducial sp_validation #192 head, 120 files) merged in; the
    cosmology.py KeyError: 'mnu' blocker fixed (one line) — test_cosmology.py 26/26 green.
    Sacha's broad .gitignore bans (*.png *.sh *.fits) were rejected, not adopted.
  • Back-pressure guard local calibration #1 green. tests/test_imports.py imports every library module and
    resolves every first-party import named by the standalone scripts/ (42 passed, 1 xfailed —
    the xfail is a real dead-import find in plot_leakage.py).
  • Clean base. Generated apidoc stubs + uv.lock gitignored; a merge regression that
    un-ignored ~700 generated glass-mock configs restored.

Branch note: restructuring/proposal was fast-forwarded to the implementation, so #188
could reopen; it equals branch cleanup/restructuring. The remaining top-level moves (Phase 2)
proceed in small git mv steps behind green guards and were deliberately not forced
unattended. Guards ②–⑥ (DAG identity, config paths, output-schema, dangling-path grep) are
specified and pending.

— Claude on behalf of Cail


One organizing principle — the things you run live at the top — a clean three-way
split between analysis, papers, and scratch, and a modular workflow built for more than one
person.


The shape

Today cosmo_val is buried inside notebooks/ while cosmo_inference/ is top-level, so
you constantly hunt for where each one lives. The fix: the things a person actually runs
sit side by side at the top, sharing library code in src/ underneath.

sp_validation/
├── src/sp_validation/   library code (+ glass_mock core)
├── cosmo_val/           validation: code + config        (promoted from notebooks/)
├── cosmo_inference/     inference: code + config         (cosmosis / cosmocov)
├── workflow/            ALL analysis — modular Snakemake, multi-person → results/
├── papers/             final-figure assembly only (PDF, colour, layout)
├── scripts/            real reduction scripts (catalog builders, masking)
├── scratch/            per-person — ad hoc work + personal workflows (tracked)
├── notebooks/          curated to official demos / tutorials
├── results/            analysis products + diagnostic plots (contents gitignored)
└── docs/  tests/  config/

Division of labor

The boundary is the inputs to a paper figure: everything up to that point is analysis;
the figure itself is presentation.

  • workflow/ — all analysis. Generic, reusable, modular, organized for multiple people.
    Produces analysis products and diagnostic plots (sp_validation makes many — they go to
    results/). The bulk of the work lives here.
  • papers/<paper>/ — final-figure assembly only. The figure PDF, colours, layout,
    recombining data for presentation. Tied to one paper, and may never touch Snakemake.
  • scratch/<person>/ — personal and ad hoc. Experiments and one-off custom workflows.
    Tracked, because seeing each other's scratch is useful.

How the workflow scales — modular, not monolithic

Nothing in this analysis is computed once: the catalog changed ~20× in the first release
suite, and every paper varies the data vector, covariance, and inference. So the workflow
is parameterized — the rules are shared, the config changes each time. Snakemake's
module directive imports the rules under your own config and an output prefix, and lets
you override any single rule:

module analysis:
    snakefile: "../../workflow/Snakefile"
    config:    config              # this run's catalog, cuts, blind
    prefix:    "results/bmodes"    # products land here — no clobbering

use rule * from analysis
# swap is per-rule: redefine just the data-vector rule to override it

One top-level results/; each run namespaces under results/<name>/ via the prefix, so
people don't clobber each other. A --dry-run on each composition is the safety net that
lets the structure grow without silent breakage.


Cleanup

  • Delete defunct/ (quarantined since 2024) and the exploratory 2021–22 notebooks — it
    all stays in git history.
  • Curate notebooks/ to official demos and tutorials; personal scratchy ones move to
    scratch/.
  • Discipline via tooling, not bans: nbstripout strips notebook outputs on commit (the
    repo's weight today is committed notebook outputs), plus a pre-commit size hook.
  • Path translation — collecting the paper dirs breaks ~35 hardcoded absolute paths; a
    mechanical sweep rewrites them (scripts included) to the single repo-relative results/.

The milestone

A suite of PRs, in sequence:

  1. Foundation — merge pending local code into develop. (Sacha)
  2. Restructuring — this proposal. (this PR — draft, no implementation yet)
  3. Glass mocks → tomography.
  4. Input pipeline → tomography.

This PR is the proposal only. Implementation follows once the foundation merge lands and
the shape is agreed.


— Claude on behalf of Cail

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@cailmdaley cailmdaley closed this Jun 3, 2026
@cailmdaley cailmdaley force-pushed the restructuring/proposal branch from 55d608a to d4a64e4 Compare June 3, 2026 08:20
sachaguer and others added 6 commits June 4, 2026 08:30
Fold Sacha's pending foundation (PR #192 head, sachaguer:develop @ c22f075)
onto current develop so the restructuring builds on his foundation without
racing his merge gesture (Cail's direction, 2026-06-05).

.gitignore conflict resolved in favour of develop: kept the .felt tracking
block, rejected sacha's broad cluster bans (*.png *.sh *.fits *.out *.err) —
those get narrowed during the restructuring gitignore pass, not adopted
wholesale. cosmo_val.py / cat_config.yaml auto-merged cleanly (origin's
docstring-RST polish + sacha's functional changes did not collide).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
cosmology.py get_cosmo read planck_defaults["mnu"] but the dict never
defined the key, so every bare get_cosmo() call (no ccl_params, no mnu
arg) raised KeyError: 'mnu'. Add "mnu": PLANCK18["m_nu"] (0.06 eV).

Verified: test_cosmology.py 26/26 pass (was immediate KeyError before).
This is the one blocker that kept Sacha's foundation from running clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
docs/source/sp_validation.*.rst are regenerated on every docs build by
sphinx-apidoc (deploy-docs.yml: `sphinx-apidoc -feTMo docs/source
src/sp_validation`), matching the already-ignored fortuna.*/scripts.*
stubs — they should never be committed.

uv.lock: the container is the canonical runtime (CLAUDE.md), the lockfile
has never been tracked, so ignore it rather than make an unowned
pinned-dep commitment. One-line flip to track if we decide to pin.

Establishes a clean base for the restructuring branch.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sacha's branch removed the cosmosis_pipeline_glass_mock_0*.ini and
_v0*.ini ignore patterns, which un-ignored ~700 generated glass-mock
pipeline configs in cosmo_inference/cosmosis_config/. Restore the two
specific patterns (not broad bans) so the tree returns to develop's
clean state. These are generated artifacts, never tracked.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
First guard in the restructuring invariant suite (sp-validation-restructuring
fiber). A pure file-move must not break imports; this is the first thing to go
red on a bad git mv. Two halves, because scripts live OUTSIDE the package and a
package-only walk misses them (the gap the ngmix review's test_scripts_import
flagged):

- Package modules (src/sp_validation/*.py): real import — library code is
  import-safe, so a broken cross-module reference fails immediately.
- Standalone scripts (scripts/*.py): NOT executed (several do work at module
  level; one isn't a valid module name). Parsed with ast — which also asserts
  syntactic validity — and every first-party (sp_validation.*) import target
  resolved via importlib.util.find_spec.

Green baseline: 42 passed, 1 xfailed. The xfail is a real find — plot_leakage.py
imports `from sp_validation.correlation import *`, a module that never existed
(dead LF-leakage script). Marked xfail(strict=True) + KNOWN_BROKEN_SCRIPTS so
the baseline is honest and the strict-xfail flips the moment it's fixed/deleted;
triage belongs to the scripts/ curation pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@cailmdaley cailmdaley reopened this Jun 5, 2026
@review-notebook-app
Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

1 similar comment
@review-notebook-app
Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@cailmdaley
Copy link
Copy Markdown
Collaborator Author

Continued as #197 — this PR was closed as a side-effect of a branch rename during cleanup. The work is fully intact and now lives on a single, cleanly-named branch. — Claude on behalf of Cail

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants