Add --fast-build dev-iteration build mode (prototype) by sbryngelson · Pull Request #1528 · MFlowCode/MFC

sbryngelson · 2026-06-02T01:26:54Z

Draft / prototype. Opening for visibility and discussion. The NVHPC path is implemented and measured; the AMD/LLVMFlang path is documented but not yet implemented or validated.

What this adds

A --fast-build dev-iteration build mode for fast edit → rebuild → run loops (e.g. GPU print-debugging), where the usual optimization machinery just gets in the way.

It introduces a new CMake build type, Fast, that deliberately matches none of the existing conditional flag blocks:

not Release → no IPO/LTO, no -march=native
not Debug/RelDebug → no MFC_DEBUG, no -gpu=debug

…then adds a light -O1 via add_compile_options. Because MFC_DEBUG is off, device routines carry no host-only debug aborts, so the binary compiles cleanly without IPO. On NVHPC GPU builds it also autodetects the build node's single compute capability (nvidia-smi) and overrides the multi-arch MFC_CUDA_CC (escape hatch: MFC_FAST_ARCH=<cc>).

./mfc.sh build -t simulation --gpu acc --fast-build -j 8

fast_build is a new MFCConfig field, so it auto-generates --fast-build/--no-fast-build and gets its own build slug (does not clobber Release/Debug trees). The lock-file version is bumped for the new config field (one-time build/lock.yaml regen for existing checkouts).

Measured results (NVHPC 24.5, RTX 6000 cc75, generic `simulation`, 8 cores)

Scenario	Release (fat 5-arch)	`--fast-build` (single-arch)
Clean full build	641 s	170 s (3.8x)
Hot-module incremental (`m_riemann_solvers`)	385 s	79 s (4.9x)

Verified: no IPO (-Mextract absent), no MFC_DEBUG, single -gpu=cc75, -O1 applied; the resulting binary runs a 1D case on the GPU to exit 0 with finite output. ./mfc.sh precheck and format pass.

AMD / LLVMFlang (documented, not yet implemented)

docs/documentation/fast_build.md diagnoses the AMD link-time problem (whole-program device LTO via -flto-partitions, which re-runs every build — 20+ min) and proposes the Fast path: -fopenmp-target-jit -O1, dropping -flto-partitions, plus a zero-change "build with high -j" lever (partitions = jobs). This is analysis only — LLVMFlang was not available on the dev machine — and includes steps to validate on an AMD GPU + AMD-compiler node.

Not in this PR yet

LLVMFlang Fast branch (the AMD JIT path) — pending hardware validation
--gpu-arch CLI flag (only MFC_FAST_ARCH env exists today)
Cray-on-AMD check, --help/key-option polish

Notes for reviewers

CPU and existing GPU build types are unaffected: Fast is additive and gated on its own build type; the autodetect only acts when MFC_CUDA_CC is already set (NVHPC) and --fast-build is passed.
The CMAKE_*_FLAGS_FAST cache vars are placeholders; the real -O1 is injected via add_compile_options, matching how Debug/RelDebug inject their flags.

New 'Fast' build type for fast edit-rebuild-run iteration (e.g. GPU print debugging). It matches none of the Release-only (IPO, -march=native) or Debug/RelDebug-only (MFC_DEBUG, -gpu=debug) conditional blocks, so it inherits none of them; adds a light -O1. On NVHPC GPU builds it autodetects the node's single compute capability (nvidia-smi) and overrides the multi-arch MFC_CUDA_CC, with MFC_FAST_ARCH as a login-node escape hatch. Measured (NVHPC 24.5, RTX 6000 cc75, generic simulation, 8 cores): clean build 641s (Release fat 5-arch) -> 170s (3.8x) hot-module 385s (Release fat 5-arch) -> 79s (4.9x) Verified: builds with no IPO/MFC_DEBUG, runs a 1D case on GPU to exit 0. Adds fast_build to MFCConfig (auto --fast-build/--no-fast-build, own slug); bumps lock version to 9 for the new config field.

Documents the --fast-build dev-iteration mode: motivation, usage, the new Fast build type, measured NVHPC results (clean 3.8x, hot-module 4.9x), and the proposed AMD/LLVMFlang path (device-LTO diagnosis, -fopenmp-target-jit + -O1, high-j partitions lever) with steps to validate on an AMD GPU + AMD compiler node. The AMD path is analysis only and unverified (no LLVMFlang on the dev box).

github-actions · 2026-06-02T01:33:24Z

Claude Code Review

Head SHA: f1ad000

Files changed:

5
CMakeLists.txt
docs/documentation/fast_build.md
toolchain/mfc/build.py
toolchain/mfc/lock.py
toolchain/mfc/state.py

Findings

Redundant `-O1` for Fortran in `Fast` build type; code comment is incorrect

CMakeLists.txt

The new Fast build type sets -O1 for Fortran through two independent paths:

set(CMAKE_Fortran_FLAGS_FAST "-O1" CACHE STRING "") — CMake's per-build-type flag variable, which does inject the flag into Fortran compilations for CMAKE_BUILD_TYPE=Fast.
add_compile_options($<$<COMPILE_LANGUAGE:Fortran>:-O1>) inside the if (CMAKE_BUILD_TYPE STREQUAL "Fast") block.

Fortran files therefore receive -O1 -O1 while C and C++ files receive -O1 once (only from CMAKE_C_FLAGS_FAST / CMAKE_CXX_FLAGS_FAST). The duplicate flag is harmless at runtime (the compiler accepts it, last flag wins), but the accompanying code comment is incorrect:

# Fast: light optimization for dev iteration. Like Debug/RelDebug, the real opt
# flag is injected here (the CMAKE_*_FLAGS_FAST cache vars are placeholders).

The CMAKE_*_FLAGS_<BUILDTYPE> cache variables are not placeholders — CMake appends them to the compiler command line for the matching build type, just as it does for CMAKE_Fortran_FLAGS_RELDEBUG "-g" added a few lines above. The add_compile_options block is redundant for Fortran and adds no coverage for C/C++.

Impact: A future maintainer who wants to change the optimization level (e.g., -O0 for the AMD JIT path discussed in fast_build.md) would need to update both locations and might miss one, silently leaving the other in effect.

Fix: Remove the add_compile_options block (the CMAKE_*_FLAGS_FAST cache vars are sufficient for all three languages) and correct the comment to match the RelDebug precedent, which already relies solely on the cache variables.

sbryngelson added 2 commits June 1, 2026 21:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add --fast-build dev-iteration build mode (prototype)#1528

Add --fast-build dev-iteration build mode (prototype)#1528
sbryngelson wants to merge 2 commits into
masterfrom
feature-fast-build

sbryngelson commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Conversation

sbryngelson commented Jun 2, 2026

What this adds

Measured results (NVHPC 24.5, RTX 6000 cc75, generic simulation, 8 cores)

AMD / LLVMFlang (documented, not yet implemented)

Not in this PR yet

Notes for reviewers

Uh oh!

github-actions Bot commented Jun 2, 2026

Claude Code Review

Findings

Redundant -O1 for Fortran in Fast build type; code comment is incorrect

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Measured results (NVHPC 24.5, RTX 6000 cc75, generic `simulation`, 8 cores)

Redundant `-O1` for Fortran in `Fast` build type; code comment is incorrect