feat(bootstrap): parallelize RESOLVE and PREPARE_SOURCE I/O with background threads#1199
feat(bootstrap): parallelize RESOLVE and PREPARE_SOURCE I/O with background threads#1199dhellmann wants to merge 1 commit into
Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughA new background-threaded I/O prefetch pipeline is added to the bootstrapper. Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/fromager/commands/bootstrap.py (1)
121-149:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd a Sphinx version directive for the new CLI option.
Line 121 introduces a user-facing CLI flag, but the command docstring doesn’t record this with a Sphinx version marker.
Proposed change
def bootstrap( @@ ) -> None: """Compute and build the dependencies of a set of requirements recursively TOPLEVEL is a requirements specification, including a package name and optional version constraints. + .. versionchanged:: NEXT + Added ``--bg-threads`` to control background I/O prefetch worker count. + """As per coding guidelines,
**/*.{rst,py}requires Sphinxversionadded/versionchangeddirectives for user-facing changes.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/fromager/commands/bootstrap.py` around lines 121 - 149, The new CLI option `--bg-threads` introduced in the bootstrap function's click decorators is missing a Sphinx version directive in the function docstring. Add a `:versionadded::` directive to the docstring of the bootstrap function to document when this new user-facing CLI flag was introduced, following the coding guidelines requirement for marking changes in py and rst files.Source: Coding guidelines
🧹 Nitpick comments (2)
src/fromager/commands/bootstrap.py (1)
531-544: ⚡ Quick winExpose and forward
num_bg_threadsinbootstrap_parallelfor tunable performance.
bootstrap_parallelcurrently forces the default background-thread count when it invokesbootstrap, so callers of the parallel command cannot tune this knob.Proposed change
`@click.option`( + "--bg-threads", + "num_bg_threads", + type=click.IntRange(min=1), + default=max(1, (os.cpu_count() or 2) // 2), + show_default=True, + help="Number of background threads for parallel I/O pre-fetching (min 1).", +) +@click.option( "--max-release-age", @@ def bootstrap_parallel( @@ multiple_versions: bool, max_release_age: int | None, + num_bg_threads: int, toplevel: list[str], ) -> None: @@ ctx.invoke( bootstrap, @@ multiple_versions=multiple_versions, max_release_age=max_release_age, + num_bg_threads=num_bg_threads, toplevel=toplevel, )Also applies to: 562-572
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/fromager/commands/bootstrap.py` around lines 531 - 544, The `bootstrap_parallel` function does not expose a `num_bg_threads` parameter in its function signature, preventing callers from tuning the background thread count when it internally invokes `bootstrap`. Add `num_bg_threads` as a parameter to the `bootstrap_parallel` function signature, and then forward this parameter when calling `bootstrap` within the function body to allow callers to control this performance knob.tests/test_bootstrapper_iterative.py (1)
260-386: ⚡ Quick winAdd one
bg_futureexception-path test to cover threaded failure propagation.The updated suite validates resolved/empty futures, but not the case where
item.bg_future.result()re-raises an exception from the background resolver.Proposed change
def _make_resolved_future( result: typing.Any, ) -> concurrent.futures.Future[typing.Any]: @@ future.set_result(result) return future +def _make_failed_future( + err: Exception, +) -> concurrent.futures.Future[typing.Any]: + """Return an already-failed Future carrying *err*.""" + future: concurrent.futures.Future[typing.Any] = concurrent.futures.Future() + future.set_exception(err) + return future + @@ class TestPhaseResolve: + def test_future_exception_propagates(self, tmp_context: WorkContext) -> None: + bt = bootstrapper.Bootstrapper(tmp_context) + item = _make_resolve_item() + item.bg_future = _make_failed_future(RuntimeError("resolver failed")) + + with pytest.raises(RuntimeError, match="resolver failed"): + bt._phase_resolve(item) + def test_single_version(self, tmp_context: WorkContext) -> None:As per coding guidelines,
tests/**asks us to verify intended behavior and flag missing edge cases.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_bootstrapper_iterative.py` around lines 260 - 386, Add a new test method to the TestPhaseResolve class that covers the exception path when item.bg_future.result() raises an exception from the background resolver. The test should create a resolve item with a bg_future that raises an exception when result() is called, invoke _phase_resolve on the bootstrapper, and verify that the exception is properly propagated. This test should validate that exceptions from the background resolver thread are correctly surfaced, covering the missing edge case in exception handling.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/fromager/bootstrapper.py`:
- Around line 350-405: The background task functions _bg_resolve,
_bg_prepare_source, and _bg_prepare_prebuilt execute outside the main-thread
context and therefore lose per-requirement log attribution. Wrap each of these
functions with req_ctxvar_context() using the appropriate requirement parameter
to preserve requirement-scoped logging context. For _bg_resolve wrap with
req_ctxvar_context(req), for _bg_prepare_source wrap with
req_ctxvar_context(req), and for _bg_prepare_prebuilt wrap with
req_ctxvar_context(req).
- Around line 200-217: The ZipFile object created when opening wheel_filename is
not wrapped in a context manager, which can lead to file descriptor leaks
especially under parallel load. Wrap the zipfile.ZipFile call with a `with`
statement to ensure the file is properly closed and the file descriptor is
released immediately after use, moving the entire extraction logic inside the
context manager block.
---
Outside diff comments:
In `@src/fromager/commands/bootstrap.py`:
- Around line 121-149: The new CLI option `--bg-threads` introduced in the
bootstrap function's click decorators is missing a Sphinx version directive in
the function docstring. Add a `:versionadded::` directive to the docstring of
the bootstrap function to document when this new user-facing CLI flag was
introduced, following the coding guidelines requirement for marking changes in
py and rst files.
---
Nitpick comments:
In `@src/fromager/commands/bootstrap.py`:
- Around line 531-544: The `bootstrap_parallel` function does not expose a
`num_bg_threads` parameter in its function signature, preventing callers from
tuning the background thread count when it internally invokes `bootstrap`. Add
`num_bg_threads` as a parameter to the `bootstrap_parallel` function signature,
and then forward this parameter when calling `bootstrap` within the function
body to allow callers to control this performance knob.
In `@tests/test_bootstrapper_iterative.py`:
- Around line 260-386: Add a new test method to the TestPhaseResolve class that
covers the exception path when item.bg_future.result() raises an exception from
the background resolver. The test should create a resolve item with a bg_future
that raises an exception when result() is called, invoke _phase_resolve on the
bootstrapper, and verify that the exception is properly propagated. This test
should validate that exceptions from the background resolver thread are
correctly surfaced, covering the missing edge case in exception handling.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e01796a7-cf1c-434b-9f19-3b1fa7dfc08a
📒 Files selected for processing (5)
docs/proposals/background-tasks.mdsrc/fromager/bootstrap_requirement_resolver.pysrc/fromager/bootstrapper.pysrc/fromager/commands/bootstrap.pytests/test_bootstrapper_iterative.py
|
This pull request has merge conflicts that must be resolved before it can be merged. |
ae78528 to
1df982f
Compare
…ground threads Submit version resolution and source download/unpack to a thread pool as items are pushed onto the bootstrap stack, overlapping I/O with the main thread's serial processing. Key design decisions: - Background callables are module-level functions (no self capture) so background threads cannot accidentally access mutable Bootstrapper state (self.why, self._seen_requirements, etc.) - _push_items() helper guarantees every RESOLVE and PREPARE_SOURCE item gets a bg_future, including the initial item in bootstrap() - _drain_background_pool() provides an exclusive-build barrier - BootstrapRequirementResolver cache protected by threading.Lock - --bg-threads CLI option (min 1, default: cpu_count // 2) - finalize() uses cancel_futures=True to avoid blocking on abort paths Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Doug Hellmann <dhellmann@redhat.com>
1df982f to
188b4a9
Compare
Summary
Add a thread pool for processing some tasks from the bootstrap work item stack before they are popped. Keep the stack processing as it is, allowing the backgrounded phases to get the results from the future object created by the thread pool. Apply this approach to the phases for resolving requirements and preparing source.
Ensure that background callables are module-level functions (no
selfcapture) so threads cannot accidentally access mutableBootstrapperstate (self.why,self._seen_requirements, etc.)Add a
_push_items()helper to guarantee everyRESOLVEandPREPARE_SOURCEitem has abg_futureset, including the initial item inbootstrap()Set the minimum size of the thread pool to 1 to avoid special cases.
Test plan
hatch run test:test tests/test_bootstrapper.py tests/test_bootstrapper_iterative.py)TestPhaseResolvetests updated to use pre-completedFutureobjects instead of patchingresolve_versions_resolver.resolvedirectly (the path background threads use)./e2e/test_bootstrap_parallel_git_url.shpasses end-to-end🤖 Generated with Claude Code