Skip to content

patch_ib Memory Copy Reduction#1531

Open
danieljvickers wants to merge 4 commits into
MFlowCode:masterfrom
danieljvickers:1453-patch-mem-cpy-reduction
Open

patch_ib Memory Copy Reduction#1531
danieljvickers wants to merge 4 commits into
MFlowCode:masterfrom
danieljvickers:1453-patch-mem-cpy-reduction

Conversation

@danieljvickers
Copy link
Copy Markdown
Member

Description

The patch_ib array was being updated on the CPU. This had the downstream effect of more host-device memory copies that strictly required. This PR modified the code to use the GPU copy of patch_ib everywhere and only copy out during file IO. This should remove memory copies and enable parallelism that was not present before.

Fixes #1453

Type of change

  • Refactor

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 2, 2026

Claude Code Review

Head SHA: 0072f07

Files changed:

  • 4
  • src/simulation/m_data_output.fpp
  • src/simulation/m_ib_patches.fpp
  • src/simulation/m_ibm.fpp
  • src/simulation/m_time_steppers.fpp

Findings

1. Uninitialized length in rectangle bounding-box computation (m_ib_patches.fpp)

The diff removes the assignments that initialised length before the corner_distance calculation in the 2D rectangle subroutine:

-        length(1) = patch_ib(patch_id)%length_x
-        length(2) = patch_ib(patch_id)%length_y
-        inverse_rotation(:,:) = patch_ib(patch_id)%rotation_matrix_inverse(:,:)

But the declaration real(wp), dimension(1:2) :: length, center is retained (context line 452) and length is still consumed immediately after (context line 468):

corner_distance = sqrt(dot_product(length, length))/2._wp

length is now uninitialized at that point. corner_distance will be garbage, so get_bounding_indices will produce wrong il/ir/jl/jr bounds. The GPU parallel loop then iterates over those wrong bounds, potentially missing IB cells or walking off the array.

Fix: Replace with corner_distance = 0.5_wp*sqrt(patch_ib(patch_id)%length_x**2 + patch_ib(patch_id)%length_y**2) (consistent with the cuboid fix in the same PR), and drop the now-unused length and inverse_rotation declarations.


2. Uninitialized length in cylinder bounding-box computation (m_ib_patches.fpp)

Same pattern in the 3D cylinder subroutine. The diff removes:

-        length(1) = patch_ib(patch_id)%length_x
-        length(2) = patch_ib(patch_id)%length_y
-        length(3) = patch_ib(patch_id)%length_z
-        radius = patch_ib(patch_id)%radius

The declaration real(wp), dimension(1:3) :: xyz_local, center, length remains (context line 616), and length is still read (context line 635):

corner_distance = sqrt(patch_ib(patch_id)%radius**2 + maxval(length)**2)

maxval(length) is undefined, so the cylinder's bounding box is wrong. Only radius was migrated to the direct patch_ib access; length was not.

Fix: Replace maxval(length) with max(patch_ib(patch_id)%length_x, patch_ib(patch_id)%length_y, patch_ib(patch_id)%length_z) (or the axis-appropriate subset), and remove the length declaration.


3. Missing GPU_UPDATE(host=...) before s_update_mib calls (m_time_steppers.fpp, m_ibm.fpp)

The PR converts two host-side do i = 1, num_ibs loops into GPU_PARALLEL_LOOPs, so patch_ib centroids, velocities, angles, and rotation matrices are now updated exclusively on the device. Both sites then immediately call s_update_mib (which performs MPI communication and broadcasts IB positions) without first pulling the device-modified data back to the host.

m_time_steppers.fpp (context lines 763–765):

$:END_GPU_PARALLEL_LOOP()   ! patch_ib updated on device only
call s_update_mib(num_ibs)  ! reads stale host copy -> broadcasts wrong positions

The old flow was host→GPU_UPDATE(device)→s_update_mib; the new GPU loop writes to device but the host copy is never refreshed.

m_ibm.fpp (rotation-matrix recalculation, diff hunk ~line 871):

+$:GPU_PARALLEL_LOOP(private='[i]')
 do i = 1, num_ibs
     if (patch_ib(i)%moving_ibm /= 0) then
         call s_update_ib_rotation_matrix(i)
     end if
 end do
-$:GPU_UPDATE(device='[patch_ib]')
+$:END_GPU_PARALLEL_LOOP()
 ! recompute the new ib_patch locations and broadcast them.

The old GPU_UPDATE(device=...) pushed host→device; removing it without adding GPU_UPDATE(host=...) means s_update_mib (called just below via the APPLY-IB-PATCHES block) operates on the pre-loop host state.

Fix: Insert $:GPU_UPDATE(host='[patch_ib(1:num_ibs)]') immediately after each END_GPU_PARALLEL_LOOP and before s_update_mib.

@danieljvickers danieljvickers marked this pull request as draft June 2, 2026 22:10
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 60.63%. Comparing base (2d0d2fd) to head (0fd4443).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1531   +/-   ##
=======================================
  Coverage   60.63%   60.63%           
=======================================
  Files          73       73           
  Lines       20219    20219           
  Branches     2937     2937           
=======================================
  Hits        12259    12259           
  Misses       5972     5972           
  Partials     1988     1988           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@danieljvickers danieljvickers marked this pull request as ready for review June 3, 2026 14:18
@danieljvickers
Copy link
Copy Markdown
Member Author

There was one $:GPU_UPDATE(host=...) that I simply could not remove unless I wanted to break another more-important optimization. I have a solution for this, but it requires the changes that will be made from the projection optimization first. Choosing to merge this in its current state, as I removed several copies that should improve performance. Will address the final memory copy in the m_ib_patch.fpp refactor coming from issue #1454

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

Patch Mem Cpy Reduction

1 participant