Use std::execution::par in dbEdgeProcessor and dbPolygonTools sorts#2364
Open
nikosavola wants to merge 2 commits into
Open
Use std::execution::par in dbEdgeProcessor and dbPolygonTools sorts#2364nikosavola wants to merge 2 commits into
std::execution::par in dbEdgeProcessor and dbPolygonTools sorts#2364nikosavola wants to merge 2 commits into
Conversation
This was referenced Jun 3, 2026
…ecution header guards
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #1119
This PR introduces C++17 parallel sorting (
std::execution::par) to the core database sorting operations withindbEdgeProcessor.ccanddbPolygonTools.cc.Flat operations (boolean and sizing) on extremely dense shape regions require heavy sorting of large edge arrays (e.g., sorting by
yminbefore scanline sweeps). Moving these from sequentialstd::sortto C++17 parallel sorting (backed by Intel TBB on GCC/Linux) should yield some performance boost for flat layout manipulation.Benchmark Results
This PR introduces parallel sorting using C++17 execution policies in
dbEdgeProcessor.ccanddbPolygonTools.cc.To verify the performance improvements, I created a test script that generates a large grid of$1000 \times 1000$ intersecting polygons and measures the execution time of boolean operations (
AND) which relies heavily on edge processing and polygon tools.Benchmark Setup
taskset -c <cores> env LD_LIBRARY_PATH=./bin-release ./bin-release/klayout -b -r test_bool.rbhyperfine(min 3 runs per thread count)Note:
tasksetwas used to precisely limit the CPU cores available to TBB.Results
The table below shows the execution times across different thread counts:
While the speedup scales incrementally with the thread count, the gains are modest (~7% speedup from 1 to 8 threads). The parallel sorting via TBB accelerates the sorting step itself but other sequential parts of the boolean operations still dominate the overall execution time.
Reproduction
The results were gathered using the following Python script with, which handles the
hyperfineexecution and plots the results.Benchmark Script (`benchmark.py`)
Test Case (`test_bool.rb`)
You can run it with
uv run benchmark.pyto install dependencies and execute the benchmark in a folder with./bin-releasecontaining a KLayout build.Compilation Notes
<execution>, the compilation needs to specify at least-std=c++17.LIBS += -ltbbto link the Intel Threading Building Blocks library correctly. Not sure what would be a nice general solution for allowing other libraries to implement the execution policy from the standard library.