Skip to content

e2e: wait for multicast-group convergence in doublezero status check#3908

Merged
ben-dz merged 2 commits into
mainfrom
bdz/doublezero-3907
Jun 16, 2026
Merged

e2e: wait for multicast-group convergence in doublezero status check#3908
ben-dz merged 2 commits into
mainfrom
bdz/doublezero-3907

Conversation

@ben-dz

@ben-dz ben-dz commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Summary of Changes

  • Wrap the post-connect doublezero status assertion in checkMulticastPostConnect in a polling loop so it waits for multicast-group membership to converge instead of taking a single, non-retried snapshot. This fixes a test-side race where the second group (mg02) subscription had not yet propagated into the client daemon's local status view, producing S:mg01 instead of the expected S:mg01,S:mg02 and failing the CLI-table diff.
  • The fix is shared by both the publisher and subscriber paths. The sibling doublezero_multicast_group_list check is intentionally left as a single snapshot because it reads onchain state, which is already consistent.
  • The polling timeout/interval (60s/1s) matches the existing convergence-sensitive checks in the same file (e.g. the multicast route check).
  • On a genuine timeout, the last observed table and structured -(want), +(got) diff (or the last exec error) are still printed, so a real failure is not opaque.
  • This is a pre-existing race that happened to surface on PR build(deps): bump openssl from 0.10.73 to 0.10.81 #3900; it is fixed here and not bundled into that PR.
  • Fixes e2e: flaky TestE2E_Multicast subscriber status check races multicast-group convergence #3907

Testing Verification

  • go vet -tags e2e ./e2e passes; the package compiles and is gofmt-clean.
  • The race is timing-dependent and only reproduces under load across repeated full TestE2E_Multicast runs (each spins up multiple cEOS containers), so it cannot be deterministically reproduced in a single run. The change converts the single-snapshot assertion that lost the race into the same Eventually-poll idiom already proven for the route/PIM convergence checks in this file, which read the same asynchronously-populated daemon state.

@ben-dz ben-dz force-pushed the bdz/doublezero-3907 branch from 39d9731 to 6e0cfdc Compare June 16, 2026 17:10
@ben-dz ben-dz marked this pull request as ready for review June 16, 2026 17:16
@ben-dz ben-dz requested a review from nikw9944 June 16, 2026 17:16
@ben-dz ben-dz merged commit 0cf7773 into main Jun 16, 2026
33 checks passed
@ben-dz ben-dz deleted the bdz/doublezero-3907 branch June 16, 2026 17:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

e2e: flaky TestE2E_Multicast subscriber status check races multicast-group convergence

2 participants