Add --fast-build dev-iteration build mode (prototype)#1528
Conversation
New 'Fast' build type for fast edit-rebuild-run iteration (e.g. GPU print debugging). It matches none of the Release-only (IPO, -march=native) or Debug/RelDebug-only (MFC_DEBUG, -gpu=debug) conditional blocks, so it inherits none of them; adds a light -O1. On NVHPC GPU builds it autodetects the node's single compute capability (nvidia-smi) and overrides the multi-arch MFC_CUDA_CC, with MFC_FAST_ARCH as a login-node escape hatch. Measured (NVHPC 24.5, RTX 6000 cc75, generic simulation, 8 cores): clean build 641s (Release fat 5-arch) -> 170s (3.8x) hot-module 385s (Release fat 5-arch) -> 79s (4.9x) Verified: builds with no IPO/MFC_DEBUG, runs a 1D case on GPU to exit 0. Adds fast_build to MFCConfig (auto --fast-build/--no-fast-build, own slug); bumps lock version to 9 for the new config field.
Documents the --fast-build dev-iteration mode: motivation, usage, the new Fast build type, measured NVHPC results (clean 3.8x, hot-module 4.9x), and the proposed AMD/LLVMFlang path (device-LTO diagnosis, -fopenmp-target-jit + -O1, high-j partitions lever) with steps to validate on an AMD GPU + AMD compiler node. The AMD path is analysis only and unverified (no LLVMFlang on the dev box).
Claude Code ReviewHead SHA: f1ad000 Files changed:
FindingsRedundant
|
What this adds
A
--fast-builddev-iteration build mode for fast edit → rebuild → run loops (e.g. GPU print-debugging), where the usual optimization machinery just gets in the way.It introduces a new CMake build type,
Fast, that deliberately matches none of the existing conditional flag blocks:Release→ no IPO/LTO, no-march=nativeDebug/RelDebug→ noMFC_DEBUG, no-gpu=debug…then adds a light
-O1viaadd_compile_options. BecauseMFC_DEBUGis off, device routines carry no host-only debug aborts, so the binary compiles cleanly without IPO. On NVHPC GPU builds it also autodetects the build node's single compute capability (nvidia-smi) and overrides the multi-archMFC_CUDA_CC(escape hatch:MFC_FAST_ARCH=<cc>).fast_buildis a newMFCConfigfield, so it auto-generates--fast-build/--no-fast-buildand gets its own build slug (does not clobber Release/Debug trees). The lock-file version is bumped for the new config field (one-timebuild/lock.yamlregen for existing checkouts).Measured results (NVHPC 24.5, RTX 6000 cc75, generic
simulation, 8 cores)--fast-build(single-arch)m_riemann_solvers)Verified: no IPO (
-Mextractabsent), noMFC_DEBUG, single-gpu=cc75,-O1applied; the resulting binary runs a 1D case on the GPU to exit 0 with finite output../mfc.sh precheckandformatpass.AMD / LLVMFlang (documented, not yet implemented)
docs/documentation/fast_build.mddiagnoses the AMD link-time problem (whole-program device LTO via-flto-partitions, which re-runs every build — 20+ min) and proposes theFastpath:-fopenmp-target-jit -O1, dropping-flto-partitions, plus a zero-change "build with high-j" lever (partitions = jobs). This is analysis only — LLVMFlang was not available on the dev machine — and includes steps to validate on an AMD GPU + AMD-compiler node.Not in this PR yet
Fastbranch (the AMD JIT path) — pending hardware validation--gpu-archCLI flag (onlyMFC_FAST_ARCHenv exists today)--help/key-option polishNotes for reviewers
Fastis additive and gated on its own build type; the autodetect only acts whenMFC_CUDA_CCis already set (NVHPC) and--fast-buildis passed.CMAKE_*_FLAGS_FASTcache vars are placeholders; the real-O1is injected viaadd_compile_options, matching how Debug/RelDebug inject their flags.