Skip container startup for empty scenarios#6752
Draft
nccatoni wants to merge 36 commits into
Draft
Conversation
Read agent version from image label in AgentContainer.configure(). When both library and agent versions are known pre-collection, defer container startup to post-collection warmups so containers are never started when no tests are selected for the scenario.
Contributor
|
|
Contributor
|
For the non-deferred path (old images without labels), the watchdog was moved to post_collection_warmups alongside _wait_for_app_readiness. This means the watchdog starts only after collection, by which time the proxy has already written files that end up in the observer's initial snapshot and are never ingested. Restore the original behaviour: insert _start_interfaces_watchdog at position 1 in warmups (before _create_network) for the elif/else paths, matching what the original code did with warmups.insert(1, ...). Also move _log_starting_containers into _defer_container_startup so the "Starting containers..." message is printed when containers actually start rather than during the pre-collection warmup phase.
In the deferred path, _set_agent_component was called from post_collection_warmups, after pytest_collection_modifyitems had already run. That hook builds the Manifest from context.scenario.components, and match_condition returns False for any rule whose component is absent — so all agent-version-gated skip/xfail markers were silently dropped, causing tests that should be skipped to run and fail. Since agent_version is already known from the image label at configure time (that's the condition for taking the deferred path), call _set_agent_component() directly during configure alongside _set_library_component(), and remove the now-redundant call from _defer_container_startup. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
_log_starting_containers was added at position 0, shifting _create_network to position 1. The insert(1, watchdog) now placed the watchdog before network creation. Use insert(2, ...) to restore the correct order: log → network → watchdog → containers
Use GetAssemblyVersion (already used in parametric) in poc/uds Dockerfiles to read the version from Datadog.Trace.dll and write /system-tests-library-version when the install script could not determine the version (i.e. .so install path).
8 tests covering: - defer path: container startup absent from warmups, present in post_collection_warmups in the right order (network→watchdog→containers→readiness) - fallback/legacy paths: watchdog at index 2 (after _create_network) - execute_post_collection_warmups: invokes all callables, calls close_targets() and re-raises on error
…scenario - EndToEndScenario.configure: replace 3-branch if/elif/else with two flat blocks for library_known / agent_known and a single defer-or-watchdog tail. - Container post_start methods stop emitting Library/Agent/Backend/UDS/variant log lines; the scenario warmup is now the sole owner of those logs (no more divergent ordering between label and healthcheck paths). - Track container-startup warmups on the scenario so the defer path can move them to post_collection_warmups by identity instead of rebuilding lambdas.
- DebuggerScenario: pick warmup target list with a ternary instead of branching. - GoProxiesScenario._set_components: drop defensive None guard; agent_version is always set by configure() (label) or post_start() (healthcheck) before this warmup runs. - conftest: use truthy check on session.items.
- Drop duplicate ProxyContainer stub (identical to TestedContainer one). - Yield-with-cleanup fixture pops the test scenario from the global group registry to avoid polluting subsequent tests. - Drop unused config attrs and ad-hoc replay parameter from helpers. - Replace exact-index assertions on warmups[0..3] (which broke when the 'Starting containers' log entry became an anonymous lambda) with ordering invariants via .index(). - Whitelist SLF001/ANN001 for tests/test_the_test/* (warmup tests need to inspect privates and stub internal interfaces); drop the now-unused per-line ANN001 noqa directives in two existing files.
…tool stage - build.sh: drop the multi-path lookup loop. Every install_ddtrace.sh on this branch writes /system-tests-library-version, so reading the canonical path is sufficient. - Pre-build the .NET assembly-version helper image once (system_tests/dotnet-version-tool) and have both poc.Dockerfile and uds.Dockerfile COPY --from=that tag, removing the duplicated build-version-tool stage from each Dockerfile.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
In CI, ~38% of scenario invocations are "empty" — all collected tests are deselected for the given library/weblog combination. Docker infrastructure was still started and torn down for every empty scenario, wasting ~17h of compute per full CI run.
Root cause: containers started in
pytest_sessionstart, before pytest knew whether any tests would run.What this does
Writes library version at build time — each weblog image gets a
/system-tests-library-versionfile and asystem-tests-library-versionDocker label viainstall_ddtrace.sh. Agent version is read from the agent image label.Defers container startup to post-collection — new
post_collection_warmupshook (inpytest_collection_finish). When both versions are known from labels (common case), containers are never created if no tests are selected.Preserves log output —
Agent:,Library:, andWeblog variant:lines are still printed beforetest session startsusing label data.Graceful fallback — older images without the label fall through to the legacy path (containers start in
pytest_sessionstartas before).Impact
Empty scenarios now complete in ~2.5s instead of 20–40s.
Notes
docs/adr/002-skip-empty-scenario-containers.mdfor the full decision record.include_agent=False) fall through to the fallback/legacy path and are unaffected.