patch_ib Memory Copy Reduction by danieljvickers · Pull Request #1531 · MFlowCode/MFC

danieljvickers · 2026-06-02T21:55:59Z

Description

The patch_ib array was being updated on the CPU. This had the downstream effect of more host-device memory copies that strictly required. This PR modified the code to use the GPU copy of patch_ib everywhere and only copy out during file IO. This should remove memory copies and enable parallelism that was not present before.

Fixes #1453

Type of change

Refactor

github-actions · 2026-06-02T22:00:54Z

Claude Code Review

Head SHA: 0072f07

Files changed:

4
src/simulation/m_data_output.fpp
src/simulation/m_ib_patches.fpp
src/simulation/m_ibm.fpp
src/simulation/m_time_steppers.fpp

Findings

1. Uninitialized `length` in rectangle bounding-box computation (`m_ib_patches.fpp`)

The diff removes the assignments that initialised length before the corner_distance calculation in the 2D rectangle subroutine:

-        length(1) = patch_ib(patch_id)%length_x
-        length(2) = patch_ib(patch_id)%length_y
-        inverse_rotation(:,:) = patch_ib(patch_id)%rotation_matrix_inverse(:,:)

But the declaration real(wp), dimension(1:2) :: length, center is retained (context line 452) and length is still consumed immediately after (context line 468):

corner_distance = sqrt(dot_product(length, length))/2._wp

length is now uninitialized at that point. corner_distance will be garbage, so get_bounding_indices will produce wrong il/ir/jl/jr bounds. The GPU parallel loop then iterates over those wrong bounds, potentially missing IB cells or walking off the array.

Fix: Replace with corner_distance = 0.5_wp*sqrt(patch_ib(patch_id)%length_x**2 + patch_ib(patch_id)%length_y**2) (consistent with the cuboid fix in the same PR), and drop the now-unused length and inverse_rotation declarations.

2. Uninitialized `length` in cylinder bounding-box computation (`m_ib_patches.fpp`)

Same pattern in the 3D cylinder subroutine. The diff removes:

-        length(1) = patch_ib(patch_id)%length_x
-        length(2) = patch_ib(patch_id)%length_y
-        length(3) = patch_ib(patch_id)%length_z
-        radius = patch_ib(patch_id)%radius

The declaration real(wp), dimension(1:3) :: xyz_local, center, length remains (context line 616), and length is still read (context line 635):

corner_distance = sqrt(patch_ib(patch_id)%radius**2 + maxval(length)**2)

maxval(length) is undefined, so the cylinder's bounding box is wrong. Only radius was migrated to the direct patch_ib access; length was not.

Fix: Replace maxval(length) with max(patch_ib(patch_id)%length_x, patch_ib(patch_id)%length_y, patch_ib(patch_id)%length_z) (or the axis-appropriate subset), and remove the length declaration.

3. Missing `GPU_UPDATE(host=...)` before `s_update_mib` calls (`m_time_steppers.fpp`, `m_ibm.fpp`)

The PR converts two host-side do i = 1, num_ibs loops into GPU_PARALLEL_LOOPs, so patch_ib centroids, velocities, angles, and rotation matrices are now updated exclusively on the device. Both sites then immediately call s_update_mib (which performs MPI communication and broadcasts IB positions) without first pulling the device-modified data back to the host.

m_time_steppers.fpp (context lines 763–765):

$:END_GPU_PARALLEL_LOOP()   ! patch_ib updated on device only
call s_update_mib(num_ibs)  ! reads stale host copy -> broadcasts wrong positions

The old flow was host→GPU_UPDATE(device)→s_update_mib; the new GPU loop writes to device but the host copy is never refreshed.

m_ibm.fpp (rotation-matrix recalculation, diff hunk ~line 871):

+$:GPU_PARALLEL_LOOP(private='[i]')
 do i = 1, num_ibs
     if (patch_ib(i)%moving_ibm /= 0) then
         call s_update_ib_rotation_matrix(i)
     end if
 end do
-$:GPU_UPDATE(device='[patch_ib]')
+$:END_GPU_PARALLEL_LOOP()
 ! recompute the new ib_patch locations and broadcast them.

The old GPU_UPDATE(device=...) pushed host→device; removing it without adding GPU_UPDATE(host=...) means s_update_mib (called just below via the APPLY-IB-PATCHES block) operates on the pre-loop host state.

Fix: Insert $:GPU_UPDATE(host='[patch_ib(1:num_ibs)]') immediately after each END_GPU_PARALLEL_LOOP and before s_update_mib.

codecov · 2026-06-03T00:00:38Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 60.63%. Comparing base (2d0d2fd) to head (0fd4443).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #1531   +/-   ##
=======================================
  Coverage   60.63%   60.63%           
=======================================
  Files          73       73           
  Lines       20219    20219           
  Branches     2937     2937           
=======================================
  Hits        12259    12259           
  Misses       5972     5972           
  Partials     1988     1988

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…tines

danieljvickers · 2026-06-03T14:21:21Z

There was one $:GPU_UPDATE(host=...) that I simply could not remove unless I wanted to break another more-important optimization. I have a solution for this, but it requires the changes that will be made from the projection optimization first. Choosing to merge this in its current state, as I removed several copies that should improve performance. Will address the final memory copy in the m_ib_patch.fpp refactor coming from issue #1454

Initial pass through of removing copies

f729d69

danieljvickers requested a review from sbryngelson as a code owner June 2, 2026 21:55

Merge branch 'master' into 1453-patch-mem-cpy-reduction

0fd4443

danieljvickers marked this pull request as draft June 2, 2026 22:10

danieljvickers added 2 commits June 3, 2026 09:06

missed removing this directive

a51862d

Removed several memory copies from various IB patch projection subrou…

0072f07

…tines

danieljvickers marked this pull request as ready for review June 3, 2026 14:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

patch_ib Memory Copy Reduction#1531

patch_ib Memory Copy Reduction#1531
danieljvickers wants to merge 4 commits into
MFlowCode:masterfrom
danieljvickers:1453-patch-mem-cpy-reduction

danieljvickers commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 3, 2026

Uh oh!

danieljvickers commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Conversation

danieljvickers commented Jun 2, 2026

Description

Type of change

Uh oh!

github-actions Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Claude Code Review

Findings

1. Uninitialized length in rectangle bounding-box computation (m_ib_patches.fpp)

2. Uninitialized length in cylinder bounding-box computation (m_ib_patches.fpp)

3. Missing GPU_UPDATE(host=...) before s_update_mib calls (m_time_steppers.fpp, m_ibm.fpp)

Uh oh!

codecov Bot commented Jun 3, 2026

Codecov Report

Uh oh!

danieljvickers commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Jun 2, 2026 •

edited

Loading

1. Uninitialized `length` in rectangle bounding-box computation (`m_ib_patches.fpp`)

2. Uninitialized `length` in cylinder bounding-box computation (`m_ib_patches.fpp`)

3. Missing `GPU_UPDATE(host=...)` before `s_update_mib` calls (`m_time_steppers.fpp`, `m_ibm.fpp`)