Skip to content

Parallel HDF5: Try fixing strict collective requirements of HDF5 >= 2.0#1862

Open
franzpoeschel wants to merge 30 commits into
openPMD:devfrom
franzpoeschel:fix-parallel-hdf5
Open

Parallel HDF5: Try fixing strict collective requirements of HDF5 >= 2.0#1862
franzpoeschel wants to merge 30 commits into
openPMD:devfrom
franzpoeschel:fix-parallel-hdf5

Conversation

@franzpoeschel

@franzpoeschel franzpoeschel commented Mar 11, 2026

Copy link
Copy Markdown
Contributor

It seems that HDF5 has become quite a bit pickier about metadata definitions in parallel setups with versions 2.0 and 2.1, leading to hangups.
Earlier, it was enough to define them consistently across ranks, now we apparently have to keep the exact same order of operations.

This is bad for the Span API which runs internal flushes for structure setup.

  • No longer flush inside the Span API, instead flush already at resetDataset()
  • Try if we can only enqueue operations and run them later, should be fine for the ordering
  • Ensure that only a select type of operation runs in structural setup flushes
  • Remove flushParticlesPath and flushMeshesPath functions, these unnecessarily leaked attribute flushes into the structure setup
  • Fix tests...
  • Manually go through tests and check if any of them silently swallows some errors, the chance is high
  • Guard setDirty calls
  • There are some commits deactivating a handful of tests, deal with them
  • Decide how to go forward with the changes inside resetDataset(). Best idea: add the new logic to a new API call commitStructuralSetup() or so.

Diff at https://github.com/franzpoeschel/openPMD-api/compare/defaults-upon-close...fix-parallel-hdf5?expand=1

Comment thread src/IO/AbstractIOHandlerImpl.cpp Fixed
while (!(*m_handler).m_work.empty())
{
IOTask &i = (*m_handler).m_work.front();
// verifyFlushType(i.operation, l);

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.
Comment thread test/CoreTest.cpp Fixed
Comment thread test/ParallelIOTest.cpp Fixed
Comment thread test/ParallelIOTest.cpp Fixed
Comment thread test/ParallelIOTest.cpp Fixed
Comment thread test/ParallelIOTest.cpp Fixed
Comment thread test/ParallelIOTest.cpp Fixed
Comment thread test/ParallelIOTest.cpp Fixed
Comment thread test/SerialIOTest.cpp Fixed
Comment thread test/SerialIOTest.cpp Fixed
@franzpoeschel franzpoeschel force-pushed the fix-parallel-hdf5 branch 3 times, most recently from 48b4c2a to 07fb558 Compare March 27, 2026 10:19
Comment thread src/Iteration.cpp Fixed
}
} // namespace

std::future<void> AbstractIOHandlerImpl::flush(FlushLevel l)

Check warning

Code scanning / CodeQL

Poorly documented large function Warning

Poorly documented function: fewer than 2% comments for a function of 431 lines.

@github-advanced-security github-advanced-security AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

Comment thread test/ParallelIOTest.cpp Outdated

Check notice

Code scanning / CodeQL

Unused static function Note test

Static function openPMD_parallel_bug_1655_bp5_writer_hangup is unreachable (
CATCH2_INTERNAL_TEST_24
must be removed at the same time)
Static function openPMD_parallel_bug_1655_bp5_writer_hangup is unreachable (
autoRegistrar25
must be removed at the same time)
Comment thread test/ParallelIOTest.cpp Outdated

Check notice

Code scanning / CodeQL

Unused static function Note test

Static function CATCH2_INTERNAL_TEST_24 is unreachable (
autoRegistrar25
must be removed at the same time)
These were unnecessary, but they snuck WRITE_ATT tasks into the skeleton flush.
not so great, but lets keep that for now
This reverts commit 246609ff5fbe5edb68119b7f3b29a40e7bf23d2d.
This reverts commit 2df195749b53b0a54832585899bd469d35f81d6d.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants