Skip to content

Add --fast-build dev-iteration build mode (prototype)#1528

Draft
sbryngelson wants to merge 2 commits into
masterfrom
feature-fast-build
Draft

Add --fast-build dev-iteration build mode (prototype)#1528
sbryngelson wants to merge 2 commits into
masterfrom
feature-fast-build

Conversation

@sbryngelson
Copy link
Copy Markdown
Member

Draft / prototype. Opening for visibility and discussion. The NVHPC path is implemented and measured; the AMD/LLVMFlang path is documented but not yet implemented or validated.

What this adds

A --fast-build dev-iteration build mode for fast edit → rebuild → run loops (e.g. GPU print-debugging), where the usual optimization machinery just gets in the way.

It introduces a new CMake build type, Fast, that deliberately matches none of the existing conditional flag blocks:

  • not Release → no IPO/LTO, no -march=native
  • not Debug/RelDebug → no MFC_DEBUG, no -gpu=debug

…then adds a light -O1 via add_compile_options. Because MFC_DEBUG is off, device routines carry no host-only debug aborts, so the binary compiles cleanly without IPO. On NVHPC GPU builds it also autodetects the build node's single compute capability (nvidia-smi) and overrides the multi-arch MFC_CUDA_CC (escape hatch: MFC_FAST_ARCH=<cc>).

./mfc.sh build -t simulation --gpu acc --fast-build -j 8

fast_build is a new MFCConfig field, so it auto-generates --fast-build/--no-fast-build and gets its own build slug (does not clobber Release/Debug trees). The lock-file version is bumped for the new config field (one-time build/lock.yaml regen for existing checkouts).

Measured results (NVHPC 24.5, RTX 6000 cc75, generic simulation, 8 cores)

Scenario Release (fat 5-arch) --fast-build (single-arch)
Clean full build 641 s 170 s (3.8x)
Hot-module incremental (m_riemann_solvers) 385 s 79 s (4.9x)

Verified: no IPO (-Mextract absent), no MFC_DEBUG, single -gpu=cc75, -O1 applied; the resulting binary runs a 1D case on the GPU to exit 0 with finite output. ./mfc.sh precheck and format pass.

AMD / LLVMFlang (documented, not yet implemented)

docs/documentation/fast_build.md diagnoses the AMD link-time problem (whole-program device LTO via -flto-partitions, which re-runs every build — 20+ min) and proposes the Fast path: -fopenmp-target-jit -O1, dropping -flto-partitions, plus a zero-change "build with high -j" lever (partitions = jobs). This is analysis only — LLVMFlang was not available on the dev machine — and includes steps to validate on an AMD GPU + AMD-compiler node.

Not in this PR yet

  • LLVMFlang Fast branch (the AMD JIT path) — pending hardware validation
  • --gpu-arch CLI flag (only MFC_FAST_ARCH env exists today)
  • Cray-on-AMD check, --help/key-option polish

Notes for reviewers

  • CPU and existing GPU build types are unaffected: Fast is additive and gated on its own build type; the autodetect only acts when MFC_CUDA_CC is already set (NVHPC) and --fast-build is passed.
  • The CMAKE_*_FLAGS_FAST cache vars are placeholders; the real -O1 is injected via add_compile_options, matching how Debug/RelDebug inject their flags.

New 'Fast' build type for fast edit-rebuild-run iteration (e.g. GPU print
debugging). It matches none of the Release-only (IPO, -march=native) or
Debug/RelDebug-only (MFC_DEBUG, -gpu=debug) conditional blocks, so it inherits
none of them; adds a light -O1. On NVHPC GPU builds it autodetects the node's
single compute capability (nvidia-smi) and overrides the multi-arch MFC_CUDA_CC,
with MFC_FAST_ARCH as a login-node escape hatch.

Measured (NVHPC 24.5, RTX 6000 cc75, generic simulation, 8 cores):
  clean build  641s (Release fat 5-arch) -> 170s  (3.8x)
  hot-module   385s (Release fat 5-arch) ->  79s  (4.9x)
Verified: builds with no IPO/MFC_DEBUG, runs a 1D case on GPU to exit 0.

Adds fast_build to MFCConfig (auto --fast-build/--no-fast-build, own slug);
bumps lock version to 9 for the new config field.
Documents the --fast-build dev-iteration mode: motivation, usage, the new Fast
build type, measured NVHPC results (clean 3.8x, hot-module 4.9x), and the
proposed AMD/LLVMFlang path (device-LTO diagnosis, -fopenmp-target-jit + -O1,
high-j partitions lever) with steps to validate on an AMD GPU + AMD compiler
node. The AMD path is analysis only and unverified (no LLVMFlang on the dev box).
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 2, 2026

Claude Code Review

Head SHA: f1ad000

Files changed:

  • 5
  • CMakeLists.txt
  • docs/documentation/fast_build.md
  • toolchain/mfc/build.py
  • toolchain/mfc/lock.py
  • toolchain/mfc/state.py

Findings

Redundant -O1 for Fortran in Fast build type; code comment is incorrect

CMakeLists.txt

The new Fast build type sets -O1 for Fortran through two independent paths:

  1. set(CMAKE_Fortran_FLAGS_FAST "-O1" CACHE STRING "") — CMake's per-build-type flag variable, which does inject the flag into Fortran compilations for CMAKE_BUILD_TYPE=Fast.
  2. add_compile_options($<$<COMPILE_LANGUAGE:Fortran>:-O1>) inside the if (CMAKE_BUILD_TYPE STREQUAL "Fast") block.

Fortran files therefore receive -O1 -O1 while C and C++ files receive -O1 once (only from CMAKE_C_FLAGS_FAST / CMAKE_CXX_FLAGS_FAST). The duplicate flag is harmless at runtime (the compiler accepts it, last flag wins), but the accompanying code comment is incorrect:

# Fast: light optimization for dev iteration. Like Debug/RelDebug, the real opt
# flag is injected here (the CMAKE_*_FLAGS_FAST cache vars are placeholders).

The CMAKE_*_FLAGS_<BUILDTYPE> cache variables are not placeholders — CMake appends them to the compiler command line for the matching build type, just as it does for CMAKE_Fortran_FLAGS_RELDEBUG "-g" added a few lines above. The add_compile_options block is redundant for Fortran and adds no coverage for C/C++.

Impact: A future maintainer who wants to change the optimization level (e.g., -O0 for the AMD JIT path discussed in fast_build.md) would need to update both locations and might miss one, silently leaving the other in effect.

Fix: Remove the add_compile_options block (the CMAKE_*_FLAGS_FAST cache vars are sufficient for all three languages) and correct the comment to match the RelDebug precedent, which already relies solely on the cache variables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant