Skip container startup for empty scenarios by nccatoni · Pull Request #6752 · DataDog/system-tests

nccatoni · 2026-04-15T14:08:28Z

Context

In CI, ~38% of scenario invocations are "empty" — all collected tests are deselected for the given library/weblog combination. Docker infrastructure was still started and torn down for every empty scenario, wasting ~17h of compute per full CI run.

Root cause: containers started in pytest_sessionstart, before pytest knew whether any tests would run.

What this does

Writes library version at build time — each weblog image gets a /system-tests-library-version file and a system-tests-library-version Docker label via install_ddtrace.sh. Agent version is read from the agent image label.
Defers container startup to post-collection — new post_collection_warmups hook (in pytest_collection_finish). When both versions are known from labels (common case), containers are never created if no tests are selected.
Preserves log output — Agent:, Library:, and Weblog variant: lines are still printed before test session starts using label data.
Graceful fallback — older images without the label fall through to the legacy path (containers start in pytest_sessionstart as before).

Impact

Library	Empty runs	Time saved
PHP	~879 / 1561 (56%)	~8.8h
Ruby	~612 / 1346 (45%)	~4.5h
Golang	~205 / 396 (52%)	~1.4h
Python	~141 / 454 (31%)	~1.3h
Node.js	~138 / 330 (42%)	~1.0h
Total	~1995 / 5194 (38%)	~17h

Empty scenarios now complete in ~2.5s instead of 20–40s.

Notes

See docs/adr/002-skip-empty-scenario-containers.md for the full decision record.
Edge cases (replay mode, buddy containers, OTel, include_agent=False) fall through to the fallback/legacy path and are unaffected.

…ipts

…rmups

Read agent version from image label in AgentContainer.configure(). When both library and agent versions are known pre-collection, defer container startup to post-collection warmups so containers are never started when no tests are selected for the scenario.

… warmup

…ainer startup

…nents

github-actions · 2026-04-15T14:09:00Z

CODEOWNERS have been resolved as:

tests/test_the_test/test_collection_warmups.py                          @DataDog/system-tests-core
utils/build/docker/dotnet/version-tool.Dockerfile                       @DataDog/apm-dotnet @DataDog/asm-dotnet @DataDog/system-tests-core
conftest.py                                                             @DataDog/system-tests-core
pyproject.toml                                                          @DataDog/system-tests-core
tests/test_the_test/test_decorators.py                                  @DataDog/system-tests-core
tests/test_the_test/test_docker_scenario.py                             @DataDog/system-tests-core
utils/_context/_scenarios/core.py                                       @DataDog/system-tests-core
utils/_context/_scenarios/debugger.py                                   @DataDog/system-tests-core
utils/_context/_scenarios/endtoend.py                                   @DataDog/system-tests-core
utils/_context/_scenarios/go_proxies.py                                 @DataDog/system-tests-core
utils/_context/containers.py                                            @DataDog/system-tests-core
utils/build/build.sh                                                    @DataDog/system-tests-core
utils/build/docker/cpp_httpd/install_ddtrace.sh                         @DataDog/system-tests-core
utils/build/docker/cpp_kong/install_ddtrace.sh                          @DataDog/system-tests-core
utils/build/docker/cpp_nginx/install_ddtrace.sh                         @DataDog/system-tests-core
utils/build/docker/dotnet/install_ddtrace.sh                            @DataDog/apm-dotnet @DataDog/asm-dotnet @DataDog/system-tests-core
utils/build/docker/dotnet/poc.Dockerfile                                @DataDog/apm-dotnet @DataDog/asm-dotnet @DataDog/system-tests-core
utils/build/docker/dotnet/uds.Dockerfile                                @DataDog/apm-dotnet @DataDog/asm-dotnet @DataDog/system-tests-core
utils/build/docker/golang/install_ddtrace.sh                            @DataDog/dd-trace-go-guild @DataDog/system-tests-core
utils/build/docker/java/install_ddtrace.sh                              @DataDog/apm-java @DataDog/asm-java @DataDog/system-tests-core
utils/build/docker/nodejs/install_ddtrace.sh                            @DataDog/dd-trace-js @DataDog/system-tests-core
utils/build/docker/php/common/install_ddtrace.sh                        @DataDog/apm-php @DataDog/system-tests-core
utils/build/docker/python/install_ddtrace.sh                            @DataDog/apm-python @DataDog/asm-python @DataDog/system-tests-core
utils/build/docker/ruby/install_ddtrace.sh                              @DataDog/ruby-guild @DataDog/asm-ruby @DataDog/system-tests-core

…ty scenarios

…from label

datadog-prod-us1-3 · 2026-04-16T14:43:40Z

Tests

✨ Fix all issues with BitsAI

⚠️ Warnings

🚦 194 Pipeline jobs failed

Testing the test | System Tests (java, dev) / End-to-end #2 / akka-http 2

🧪 1 Test failed

tests.appsec.test_blocking_addresses.Test_Blocking_request_body_filenames.test_blocking[akka-http] from system_tests_suite

ValueError: No appsec event validate this condition

self = &lt;tests.appsec.test_blocking_addresses.Test_Blocking_request_body_filenames object at 0x7f35ec42ff50&gt;

    def test_blocking(self):
        &#34;&#34;&#34;Can block on server.request.body.filenames&#34;&#34;&#34;
&gt;       interfaces.library.assert_waf_attack(self.rbf_req, rule=&#34;tst-037-014&#34;)

tests/appsec/test_blocking_addresses.py:606: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
...

Testing the test | System Tests (php, dev) / End-to-end #1 / apache-mod-7.0-zts 1

🧪 1 Test failed

tests.ffe.test_exposures.Test_FFE_Exposure_Events.test_ffe_multiple_remote_config_files[apache-mod-7.0-zts] from system_tests_suite

AssertionError: Timed out waiting for exposure event for flags [&#39;test-flag-1&#39;, &#39;test-flag-2&#39;] and subject &#39;test-user-multi&#39;
assert False
 &#43;  where False = &lt;bound method ProxyBasedInterfaceValidator.wait_for of AgentInterfaceValidator(&#39;agent&#39;)&gt;(&lt;function wait_for_exposure_event.&lt;locals&gt;.&lt;lambda&gt; at 0x7fd5f464afc0&gt;, timeout=30)
 &#43;    where &lt;bound method ProxyBasedInterfaceValidator.wait_for of AgentInterfaceValidator(&#39;agent&#39;)&gt; = AgentInterfaceValidator(&#39;agent&#39;).wait_for
 &#43;      where AgentInterfaceValidator(&#39;agent&#39;) = interfaces.agent

self = &lt;tests.ffe.test_exposures.Test_FFE_Exposure_Events object at 0x7fd60a6ba7e0&gt;

    def test_ffe_multiple_remote_config_files(self):
        &#34;&#34;&#34;Test that FFE correctly handles multiple remote config files with different flags.&#34;&#34;&#34;
...

Testing the test | System Tests (python, prod) / End-to-end #2 / fastapi 2

🧪 1 Test failed

tests.test_config_consistency.Test_Config_RuntimeMetrics_Enabled.test_main[fastapi] from system_tests_suite

assert (0 &gt; 0 or 0 &gt; 0)
 &#43;  where 0 = len([])
 &#43;  and   0 = len([])

self = &lt;tests.test_config_consistency.Test_Config_RuntimeMetrics_Enabled object at 0x7fe0355e8da0&gt;

    def test_main(self):
        assert self.req.status_code == 200
    
        runtime_metrics_gauges, runtime_metrics_sketches = get_runtime_metrics()
...

View all 194 failed jobs.

ℹ️ Info

No other issues found (see more)

❄️ No new flaky tests detected

Useful? React with 👍 / 👎

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 4e1abe9 | Docs | Datadog PR Page | Give us feedback!}

For the non-deferred path (old images without labels), the watchdog was moved to post_collection_warmups alongside _wait_for_app_readiness. This means the watchdog starts only after collection, by which time the proxy has already written files that end up in the observer's initial snapshot and are never ingested. Restore the original behaviour: insert _start_interfaces_watchdog at position 1 in warmups (before _create_network) for the elif/else paths, matching what the original code did with warmups.insert(1, ...). Also move _log_starting_containers into _defer_container_startup so the "Starting containers..." message is printed when containers actually start rather than during the pre-collection warmup phase.

In the deferred path, _set_agent_component was called from post_collection_warmups, after pytest_collection_modifyitems had already run. That hook builds the Manifest from context.scenario.components, and match_condition returns False for any rule whose component is absent — so all agent-version-gated skip/xfail markers were silently dropped, causing tests that should be skipped to run and fail. Since agent_version is already known from the image label at configure time (that's the condition for taking the deferred path), call _set_agent_component() directly during configure alongside _set_library_component(), and remove the now-redundant call from _defer_container_startup. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

_log_starting_containers was added at position 0, shifting _create_network to position 1. The insert(1, watchdog) now placed the watchdog before network creation. Use insert(2, ...) to restore the correct order: log → network → watchdog → containers

Use GetAssemblyVersion (already used in parametric) in poc/uds Dockerfiles to read the version from Datadog.Trace.dll and write /system-tests-library-version when the install script could not determine the version (i.e. .so install path).

8 tests covering: - defer path: container startup absent from warmups, present in post_collection_warmups in the right order (network→watchdog→containers→readiness) - fallback/legacy paths: watchdog at index 2 (after _create_network) - execute_post_collection_warmups: invokes all callables, calls close_targets() and re-raises on error

…scenario - EndToEndScenario.configure: replace 3-branch if/elif/else with two flat blocks for library_known / agent_known and a single defer-or-watchdog tail. - Container post_start methods stop emitting Library/Agent/Backend/UDS/variant log lines; the scenario warmup is now the sole owner of those logs (no more divergent ordering between label and healthcheck paths). - Track container-startup warmups on the scenario so the defer path can move them to post_collection_warmups by identity instead of rebuilding lambdas.

- DebuggerScenario: pick warmup target list with a ternary instead of branching. - GoProxiesScenario._set_components: drop defensive None guard; agent_version is always set by configure() (label) or post_start() (healthcheck) before this warmup runs. - conftest: use truthy check on session.items.

- Drop duplicate ProxyContainer stub (identical to TestedContainer one). - Yield-with-cleanup fixture pops the test scenario from the global group registry to avoid polluting subsequent tests. - Drop unused config attrs and ad-hoc replay parameter from helpers. - Replace exact-index assertions on warmups[0..3] (which broke when the 'Starting containers' log entry became an anonymous lambda) with ordering invariants via .index(). - Whitelist SLF001/ANN001 for tests/test_the_test/* (warmup tests need to inspect privates and stub internal interfaces); drop the now-unused per-line ANN001 noqa directives in two existing files.

…tool stage - build.sh: drop the multi-path lookup loop. Every install_ddtrace.sh on this branch writes /system-tests-library-version, so reading the canonical path is sufficient. - Pre-build the .NET assembly-version helper image once (system_tests/dotnet-version-tool) and have both poc.Dockerfile and uds.Dockerfile COPY --from=that tag, removing the duplicated build-version-tool stage from each Dockerfile.

nccatoni added 14 commits April 15, 2026 09:51

write library version to /system-tests-library-version in install scr…

1115d99

…ipts

add system-tests-library-version label to weblog image after build

a876bfa

read library version from image label in WeblogContainer.configure()

a601baf

move container startup to post-collection, skip if no tests selected

3ddf1ba

fix agent_version access before post_start, move junit props after wa…

01ce802

…rmups

add timing instrumentation to warmup phases

1aa9d26

fix library version extraction for multi-stage Dockerfiles

04c38dc

remove timing instrumentation

03626df

log library version in pre-collection warmup when known from label

61747c4

restore stdout format: log context info from labels in pre-collection…

91f92d2

… warmup

move weblog system info to test context section

5706a8e

add docs/adr/ with process setup and ADR-002 for post-collection cont…

c9be9f0

…ainer startup

fix replay guard on _get_weblog_system_info, remove unused _set_compo…

cdbf529

…nents

nccatoni changed the title ~~Nccatoni/collection rework~~ Skip container startup for empty scenarios (~17h CI compute savings) Apr 15, 2026

nccatoni added 5 commits April 16, 2026 13:59

fix format and tests

5411af3

cleanup

fa1fd53

fix

d994d6c

write setup_properties.json when no tests selected, fix replay on emp…

c91ffc8

…ty scenarios

skip healthcheck read in post_start when agent_version already known …

4bfc9e8

…from label

nccatoni and others added 4 commits April 16, 2026 18:40

fix: start interfaces watchdog before containers in deferred path

9a3af9b

Merge branch 'main' into nccatoni/collection-rework

74b0775

nccatoni changed the title ~~Skip container startup for empty scenarios (~17h CI compute savings)~~ Skip container startup for empty scenarios Apr 21, 2026

nccatoni added 3 commits May 6, 2026 15:09

Merge branch 'main' into nccatoni/collection-rework

08c4777

nccatoni added 10 commits May 13, 2026 15:02

Merge branch 'main' into nccatoni/collection-rework

8d67a9f

refactor: dedupe Scenario warmup runner into _run_warmups helper

d1671ff

Merge branch 'main' into nccatoni/collection-rework

70338cc

format

456b679

fix: load library version from healthcheck log in replay mode for Lambda

4e1abe9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Skip container startup for empty scenarios#6752

Skip container startup for empty scenarios#6752
nccatoni wants to merge 36 commits into
mainfrom
nccatoni/collection-rework

nccatoni commented Apr 15, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 15, 2026 •

edited

Loading

Uh oh!

datadog-prod-us1-3 Bot commented Apr 16, 2026 •

edited by datadog-prod-us1-6 Bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

nccatoni commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

What this does

Impact

Notes

Uh oh!

github-actions Bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

datadog-prod-us1-3 Bot commented Apr 16, 2026 • edited by datadog-prod-us1-6 Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Warnings

ℹ️ Info

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nccatoni commented Apr 15, 2026 •

edited

Loading

github-actions Bot commented Apr 15, 2026 •

edited

Loading

datadog-prod-us1-3 Bot commented Apr 16, 2026 •

edited by datadog-prod-us1-6 Bot

Loading