Local Aware IBM by danieljvickers · Pull Request #1378 · MFlowCode/MFC

danieljvickers · 2026-04-23T20:06:32Z

Description

The current code has an issue scaling much past 10k particles due to limitations in the MPIAllReduceSum in the IB force computation. This PR attempts to alleviate this by limiting the number of IBs any given rank can be aware of to its neighbors. This turns the AllReduce compute to a MPI neighbor computation, removing the communication bottlneck. To support this, a massive overhaul of IB ownership between ranks was required.

Type of change

Refactor

Testing

TBD

Checklist

I added or updated tests for new behavior

AI code reviews

Reviews are not triggered automatically. To request a review, comment on the PR:

@coderabbitai review — incremental review (new changes only)
@coderabbitai full review — full review from scratch
/review — Qodo review
/improve — Qodo code suggestions
@claude full review — Claude full review (also triggers on PR open/reopen/ready)
Add label claude-full-review — Claude full review via label

… boundaries

… are only locally aware

… intended

github-actions · 2026-04-23T20:10:39Z

Claude Code Review

Head SHA: 9879df3

Files changed:

21
src/simulation/m_ibm.fpp
src/simulation/m_start_up.fpp
src/simulation/m_collisions.fpp
src/simulation/m_global_parameters.fpp
src/common/m_constants.fpp
src/common/m_derived_types.fpp
src/common/m_mpi_common.fpp
src/post_process/m_data_output.fpp
src/simulation/m_ib_patches.fpp
src/simulation/m_time_steppers.fpp

Findings:

Critical: Stack overflow in `s_reduce_ib_patch_array` (`src/simulation/m_start_up.fpp`)

subroutine s_reduce_ib_patch_array()
    type(ib_patch_parameters), dimension(num_ib_patches_max) :: patch_ib_gbl

patch_ib_gbl is a local (stack-allocated) variable dimensioned by num_ib_patches_max, which was just raised from 50,000 to 2,050,000 in src/common/m_constants.fpp. ib_patch_parameters contains a character(LEN=pathlen_max) field (pathlen_max=400) plus ~500 bytes of numeric fields — roughly 900 bytes per element. At 2,050,000 elements this is ~1.8 GB on the call stack, which will segfault on every platform (typical stack limits are 8–64 MB). This variable must be heap-allocated (allocate/deallocate).

High: Debug print left in production code (`src/simulation/m_ibm.fpp`)

print *, proc_rank, " New Owner ", patch_ib(k)%gbl_patch_id  ! TODO :: REMOVE THIS DEBUG PRINT

This fires on every ownership transfer for every locally-tracked IB, generating unbounded output at scale. The ! TODO :: REMOVE THIS DEBUG PRINT marker confirms it is unintentional.

Medium: Commented-out code in `m_collisions.fpp` leaves `pid2` lookup absent (`src/simulation/m_collisions.fpp`)

! call s_get_neighborhood_idx(pid1, pid1) ! global patch ID -> local index call s_get_neighborhood_idx(pid2, pid2)
if (pid1 <= 0 .or. pid2 <= 0) cycle

The comment text contains call s_get_neighborhood_idx(pid2, pid2) — what appears to be a second intended lookup that was accidentally folded into the comment instead of being left as executable code. Neither lookup actually executes: pid1 and pid2 are the raw decoded global IDs from s_decode_patch_periodicity, not local indices. The guard if (pid1 <= 0 .or. pid2 <= 0) cycle would never trigger for valid global IDs (which are ≥ 1), meaning the subsequent patch_ib(pid1) and patch_ib(pid2) accesses use global IDs as local array indices, which is out-of-bounds when num_ibs < num_gbl_ibs. If this is intentional (global IDs equal local indices in the no-MPI / single-rank path), that should be documented; otherwise both lookups need to be uncommented.

Low: `GPU_UPDATE(device=[patch_ib(ib_idx)%moment])` removed without replacement (`src/simulation/m_ibm.fpp`)

-            patch_ib(ib_marker)%moment = moment*patch_ib(ib_marker)%mass/(count*cell_volume)
-            $:GPU_UPDATE(device='[patch_ib(ib_marker)%moment]')
+            patch_ib(ib_idx)%moment = moment*patch_ib(ib_idx)%mass/(count*cell_volume)

The per-field device update was removed. If patch_ib(i)%moment is consumed on the GPU before the next full GPU_UPDATE(device=[patch_ib(1:num_ibs)]) (e.g., in a subsequent call to s_ibm_correct_state within the same time step), the device will use a stale value. Verify that a covering full-struct update always precedes any GPU use of %moment after s_compute_moment_of_inertia is called from s_propagate_immersed_boundaries.

…local-aware-ibm

danieljvickers · 2026-05-27T13:59:16Z

Addressing AI comments:

All of the critical issues are incorrect for one reason or another. I have gone through and verified that the relevant chunks of code are in place to address the concerns it has.

As for the neighbor boundaries, they are never called in a GPU region, so this comment is mute. I have removed the GPU routine call to help with compilation in response.

The file per process F claim is just false, and I tested that the code runs find without it. But it is true that we do not need other ranks to be doing any of this calculation regardless. I extended the rank 0 check in response.

nreqs is in fact reset between loop iterations.

Snapshot arrays are in fact reset between loop iterations

It is confused about the need to GPU allocate arrays for patch_ib. Because the array is used with a declare, but never actually allocated on the GPU, we just have the GPU pointer. When we perform the first copy to the GPU to allocate the array size, we have already done the reshape. This means that the GPU array is properly sized. This is important because the size of the patch_ib array is significant in the new code and could causes issues with limiting problem size because the GPU must be able to hold the global patch_ib array, even though it would then immediately be deallocated and reallocated to a smaller array. If the number of ib patches grows to the extent that it costs many for GB of memory, this could cause memory limitations. The current implementation should be the correct way to go.

the moving IBM flag should not be recomputed. You want it to be the same on all ranks. There is argument that it should be moved to the case level for case optimizaiton later, but that is a future PR.

Force arrays getting moved is going to be handled in a future PR that already has a github issue I created yesterday.

Prohibit for number of local IBs resolved.

In summary, two of these I made minor adjustments for. One I reject for reasons that I believe are valid and better for long-term code support. All others are incorrect.

danieljvickers · 2026-05-27T15:33:53Z

General Note: Mac errors are on sections of the code that were not touched by this PR (bubbles), and it is a random dyl linker error. I get the same error on reldebug on a chemsitry case. I get the same error locally, but on a different set of tests. This seems like a MacOS instability issue, and not related to the particular PR. Going to attempt to rerun. If ti fails again, but on different tests, then it looks like the MacOS test suite checks may be broken.

Edit: Rerunning the job changed the jobs which failed. This appears to be a spurious failure for MacOS specifically. It appears to be specific to the branch, as other branch are not experiencing it. Will investigate.

…on and cleanliness

…local-aware-ibm

…CORRECT

…nvalid. Also additional allocation protection

Add matching @:DEALLOCATE calls in: - m_derived_variables - m_global_parameters - m_hyperelastic - m_rhs - m_riemann_solvers - m_time_steppers Remaining files not yet addressed: - src/common/m_boundary_common.fpp, m_helper.fpp, m_model.fpp, m_mpi_common.fpp, m_variables_conversion.fpp - src/post_process/m_start_up.fpp - src/simulation/m_acoustic_src.fpp, m_bubbles_EE.fpp, m_ibm.fpp (pending PR MFlowCode#1378), m_qbmm.fpp Part of MFlowCode#1459

…blowing up

danieljvickers · 2026-06-01T17:59:48Z

@sbryngelson review?

danieljvickers added 16 commits March 15, 2026 07:51

Remove interior point conservative variable protection for stationary…

4273c84

… boundaries

Merge branch 'master' of github.com:danieljvickers/MFC

4b2f519

Merge branch 'MFlowCode:master' into master

ba0cacb

Merge branch 'MFlowCode:master' into master

e6988ba

Merge branch 'MFlowCode:master' into master

0ce6f11

Merge branch 'MFlowCode:master' into master

16bd456

Merge branch 'MFlowCode:master' into master

f0e117d

Merge branch 'MFlowCode:master' into master

b42028a

Merge branch 'MFlowCode:master' into master

8728091

Initial separation for patch_ibs

f652274

Added IB patch reduction at the start of the simulation so that ranks…

3baf894

… are only locally aware

intermittent commit

2365f2c

we now write the global IB index, not the local one to ib_markers, as…

71bae6a

… intended

Refactored ib reduction to use neighbor bounds

0779098

prototype of send-receive replacing all-to-all

d9ac1c2

Compilation errors resolved

ee0bc0c

danieljvickers and others added 13 commits April 24, 2026 08:31

Resolved out of bounds error

8303cea

added send test algorithm for alternative MPI communication

1c1801c

Fixed early segfault due to uninitialized IB patch array

f014ca9

Debugged rank ownership bug and invalid number of global IBs

d36fe0b

Fixed global patch ID not being present on other ranks

cc76bf3

Merge branch 'master' into local-aware-ibm

e6e0613

Updating restart data

13ad7d0

Merge branch 'local-aware-ibm' of github.com:danieljvickers/MFC into …

2983eba

…local-aware-ibm

add integer declaration

35b7864

Fixed duplicate particle output

0ab63cb

Updated post processing

fce3071

Fixed stalling issues in proc_rank > 2 cases

6e9cd50

Significant MPI debug

92b776d

Addressed AI review comments

18fe6ff

Daniel Vickers and others added 8 commits May 29, 2026 14:28

Resolved memory allocation issues on AMD compilers

0320fb8

Merge branch 'MFlowCode:master' into master

ad02d81

Merge branch 'master' of github.com:danieljvickers/MFC-Dan

5d6eff8

Merging with master

a983de5

Local spell checker passed, but test suite fails

9e7f0dd

Restoring IB neighbor radius type

849a11a

Added a few more variables that were dropped in merge

cf73406

Removed array reallocation

6faf896

danieljvickers force-pushed the local-aware-ibm branch from 9fe7e7b to 6faf896 Compare May 30, 2026 17:01

danieljvickers added 5 commits May 30, 2026 13:01

Merge branch 'master' into local-aware-ibm

b41e831

Set patch_ib back to being statically allocated array, for optimizati…

cba5e32

…on and cleanliness

Merge branch 'local-aware-ibm' of github.com:danieljvickers/MFC into …

2ace5f1

…local-aware-ibm

Resolved GNU errors. Reminder that the redundant patch decoding is IN…

0f99b69

…CORRECT

Merge branch 'master' into local-aware-ibm

2be4b64

sbryngelson added the enhancement New feature or request label May 31, 2026

Daniel Vickers added 2 commits May 31, 2026 17:17

The namelist refactor modified the merge and made the particle beds i…

7e4d356

…nvalid. Also additional allocation protection

Remove prints

0c56440

sbryngelson force-pushed the local-aware-ibm branch 2 times, most recently from 38b786d to 0c56440 Compare June 1, 2026 07:08

SVS87 mentioned this pull request Jun 1, 2026

fix: add missing @:DEALLOCATE in simulation modules #1476

Open

8 tasks

danieljvickers added 3 commits June 1, 2026 08:47

Merge branch 'master' into local-aware-ibm

36d60fe

Shrinking size of patch_ib array in order to prevent preprocess from …

3e70706

…blowing up

Prevent out-of-bounds access with smaller patch array in pre_process

d38a3db

sbryngelson merged commit 08d12f8 into MFlowCode:master Jun 1, 2026
81 checks passed

danieljvickers mentioned this pull request Jun 1, 2026

IB MPI Refactor #1359

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Local Aware IBM#1378

Local Aware IBM#1378
sbryngelson merged 112 commits into
MFlowCode:masterfrom
danieljvickers:local-aware-ibm

danieljvickers commented Apr 23, 2026 •

edited by sbryngelson

Loading

Uh oh!

github-actions Bot commented Apr 23, 2026 •

edited

Loading

Uh oh!

danieljvickers commented May 27, 2026

Uh oh!

danieljvickers commented May 27, 2026 •

edited

Loading

Uh oh!

danieljvickers commented Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Conversation

danieljvickers commented Apr 23, 2026 • edited by sbryngelson Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Testing

Checklist

AI code reviews

Uh oh!

github-actions Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Critical: Stack overflow in s_reduce_ib_patch_array (src/simulation/m_start_up.fpp)

High: Debug print left in production code (src/simulation/m_ibm.fpp)

Medium: Commented-out code in m_collisions.fpp leaves pid2 lookup absent (src/simulation/m_collisions.fpp)

Low: GPU_UPDATE(device=[patch_ib(ib_idx)%moment]) removed without replacement (src/simulation/m_ibm.fpp)

Uh oh!

danieljvickers commented May 27, 2026

Uh oh!

danieljvickers commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danieljvickers commented Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

danieljvickers commented Apr 23, 2026 •

edited by sbryngelson

Loading

github-actions Bot commented Apr 23, 2026 •

edited

Loading

Critical: Stack overflow in `s_reduce_ib_patch_array` (`src/simulation/m_start_up.fpp`)

High: Debug print left in production code (`src/simulation/m_ibm.fpp`)

Medium: Commented-out code in `m_collisions.fpp` leaves `pid2` lookup absent (`src/simulation/m_collisions.fpp`)

Low: `GPU_UPDATE(device=[patch_ib(ib_idx)%moment])` removed without replacement (`src/simulation/m_ibm.fpp`)

danieljvickers commented May 27, 2026 •

edited

Loading