Fix --debug/--reldebug GPU builds#1527
Merged
Merged
Conversation
…t of device code) s_compute_fast_magnetosonic_speed is a GPU routine (acc routine seq) but its MFC_DEBUG diagnostic block called host s_mpi_abort, which is illegal in device code without IPO inlining. Debug/reldebug builds disable IPO, so --gpu acc/mp debug builds failed with NVFORTRAN-S-1061. Guard the abort with #ifndef MFC_GPU, keeping the device-legal diagnostic print. CPU and Release builds are unchanged.
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes GPU Debug/RelDebug compilation failures by preventing a host-only MPI abort routine (s_mpi_abort) from being referenced inside a GPU device routine (s_compute_fast_magnetosonic_speed) when IPO/inlining is disabled.
Changes:
- Guarded the
s_mpi_abortcall in theMFC_DEBUGdiagnostic path so it is not compiled forMFC_GPUbuilds. - Added explanatory comments documenting why the guard is required for device compilation.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1527 +/- ##
=======================================
Coverage 60.64% 60.64%
=======================================
Files 73 73
Lines 20213 20213
Branches 2936 2936
=======================================
Hits 12259 12259
Misses 5966 5966
Partials 1988 1988 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
--debugand--reldebugGPU builds (--gpu acc/--gpu mp) fail to compile with:Root cause
s_compute_fast_magnetosonic_speedis a GPU device routine ($:GPU_ROUTINE(... parallelism='[seq]')), but its#ifdef MFC_DEBUGdiagnostic block calls the host routines_mpi_abort. That call is only legal in Release because IPO/-Minlineinlines it away — and--debug/--reldebugdisable IPO. With IPO off, the device routine is left calling a host MPI routine, which is illegal in device code.This is why the failure is debug-only: in Release the entire
#ifdef MFC_DEBUGblock is compiled out.Fix
Guard the host
s_mpi_abortcall with#ifndef MFC_GPU, keeping the device-legal diagnosticprint:A full-tree scan confirms this is the only device-region
s_mpi_abortsite for NVHPC/Cray (the 6 NaN-check aborts inm_mpi_common.fppare inside#if defined(__INTEL_COMPILER)and so are not compiled for these backends). CPU and Release builds are byte-for-byte unchanged: off-GPU the guarded call is identical, and in Release the block is compiled out.Verification
Built on NVHPC 24.5 (Quadro RTX 6000,
MFC_CUDA_CC=75):simulation --gpu acc --debugNVFORTRAN-S-1061simulation --gpu acc --reldebugNVFORTRAN-S-1061./mfc.sh precheck./mfc.sh formatVerified on NVHPC only (the GPU compiler available on the test node); the fix is gated on
MFC_GPUand is therefore compiler-agnostic (Cray/AMD untested here).