Konflux integration by AdamSaleh · Pull Request #145 · rh-gitops-midstream/catalog

AdamSaleh · 2026-05-25T16:46:35Z

Summary

Comprehensive integration test infrastructure for the GitOps operator in Konflux CI. The pipeline provisions an ephemeral HyperShift (EaaS) cluster on every run, installs the operator under test via OLM from the FBC catalog image, executes Ginkgo test suites from a QA fork, and pushes structured results to Quay and a results dashboard repo.

What this branch introduces

Pipeline structure

Ephemeral HyperShift cluster (ARM64, configurable OCP version) provisioned per run via EaaS
Three-layer test image: heavy base (tools + Go) → pre-compiled Ginkgo binaries → scripts/config rebuilt on every push — no need to rebuild the base for script changes
Gate labels on PRs control which expensive scenarios run (rc-sanity-check, rc-operator-check, rc-ui-check, etc.)
All scenarios are optional; pipeline-wrapup always runs regardless of test outcome

Test suites

Suite	Script	Notes
Sanity / smoke	`run-sanity-tests.sh`	Fast subset, triggered on every labeled PR
Sequential shard 1	`run-sequential-tests-shard1.sh`	~23 test files
Sequential shard 2	`run-sequential-tests-shard2.sh`	~20 test files
Parallel	`run-parallel-tests.sh`	Full parallel suite
Argo Rollouts	`run-rollouts-tests.sh`
UI e2e	`run-ui-e2e-tests.sh`	Playwright against OCP console GitOps plugin
ArgoCD upstream e2e	`run-argocd-e2e-tests.sh`	Standalone ArgoCD, not the operator
DAST	RapidAST/ZAP	Security scan of ArgoCD REST API

QA fork and downstream branches

Tests run from rh-gitops-release-qa/gitops-operator — a fork carrying downstream-specific patches (relaxed image assertions for registry.redhat.io images, OCP guided tour dismissal in Playwright). One branch per channel:

Channel	Branch
latest	`konflux-integration-latest`
gitops-1.21	`konflux-integration-1.21`
gitops-1.20	`konflux-integration-1.20`
gitops-1.19	`konflux-integration-1.19`

Scenarios

gitops-operator-tests.yaml / gitops-sanity-tests.yaml / gitops-ui-tests.yaml — latest channel
gitops-channel-tests-1-{21,20,19}.yaml — one file per supported channel, each containing sanity, upgrade-sanity, sequential (2 shards), parallel, rollouts, UI e2e, ArgoCD e2e, and DAST scenarios
gitops-argocd-tests.yaml / gitops-dast.yaml — standalone upstream ArgoCD and DAST

Log storage (Quay / ORAS)

Each task uploads logs incrementally as OCI artifacts during the run:

quay.io/devtools_gitops/test_image:<pipelinerun>-<task>-logs

pipeline-wrapup pulls all per-task artifacts, merges with cluster-level logs, and pushes a combined bundle:

quay.io/devtools_gitops/test_image:<pipelinerun>-logs

Artifacts expire after 7 days.

Results dashboard

publish-results.sh + render-results.py write structured JUnit summaries to rh-gitops-midstream/catalog-results. Each run appends a JSONL record under gitops-operator/<version>/ocp-<ver>/results.jsonl and the README table is re-rendered.

Code quality fixes (applied in this branch)

The following issues from a thorough code review were resolved before this PR:

Shell safety: all test runner scripts now use set -exo pipefail; git fetch scoped to the target branch with --depth=1
Shell injection: send-slack-message.py rewritten to use shell=False with explicit argument lists; collect-build-metadata.sh heredoc switched to single-quoted 'EOF' with env var passing
Gate label auth: check-gate-labels.yaml now accepts a github-token param to avoid silent bypass when GitHub API rate-limits unauthenticated shared-IP calls
Process leak: install-operator.sh background SA patch loop now has trap ... EXIT cleanup
Publish retry: publish-results.sh retry loop rewritten to exit 1 if all 3 attempts fail
Docker Hub images: resolve-openshift-version.yaml and extract-image-content-sources.yaml moved from docker.io/python:3-alpine to registry.access.redhat.com/ubi9/python-311
Render safety: render-results.py wraps the delete+render cycle in try/except with git checkout -- . rollback on failure
Typo / missing test: run-sequential-tests-shard1.sh focus list corrected (valiate → validate, missing _test.go suffixes)
Pull secret verification: install-operator.sh | head -3 removed so all injected registries are verified, not just the first three
Shallow clone: extract-image-content-sources.yaml uses --depth=1; handles both SHAs and branch names

Documentation

See .tekton/integration-tests/README.md for a full description of the pipeline structure, image layers, when to rebuild the base image, all scripts, Quay log storage, and the catalog-results repo.

AdamSaleh · 2026-06-01T07:58:22Z

/retest

AdamSaleh · 2026-06-01T08:12:18Z

/retest

AdamSaleh · 2026-06-01T12:22:44Z

/retest

AdamSaleh · 2026-06-05T13:09:28Z

/retest

AdamSaleh · 2026-06-08T12:06:30Z

/retest

AdamSaleh · 2026-06-08T12:59:26Z

/retest

AdamSaleh · 2026-06-12T08:54:43Z

/retest

AdamSaleh · 2026-06-12T14:01:04Z

/retest

AdamSaleh · 2026-06-14T13:38:33Z

/retest

AdamSaleh · 2026-06-14T14:57:52Z

/retest

AdamSaleh · 2026-06-16T09:07:04Z

/retest

AdamSaleh · 2026-06-17T06:20:25Z

/retest

AdamSaleh · 2026-06-17T19:10:48Z

/retest

AdamSaleh · 2026-06-18T12:22:32Z

/retest

AdamSaleh · 2026-06-19T22:02:26Z

/retest

AdamSaleh · 2026-06-20T08:51:41Z

/retest

AdamSaleh · 2026-06-21T07:54:26Z

/retest

There are currently four test-suites being run: - gitops-operator's e2e ginkgo test-suite, sharded into 3 scripts - the rollouts e2e tests - gitops operator's ui test verifying login (more tests to come) - the argocd tests in a separate pipeline There is simple parametrized pipeline, where you can choose: - the openshift version - size of cluster nodes - the channel to be used in the catalog - the test-script to run Secont separate pipeline installs standalone argocd and runs the e2e tests All the tests are run from precompiled docker image, the pipeline will check at the start and build them if hte images were changed. The test and utility scripts always get copied. The logs get uploaded to quay. At the end of the pipeline, it will send a message to gitops-test-notification channel on slack The code is mostly authored by prompting claude and tested against the v1.20 branch of the catalog repo. Assisted-by: Claude <usersafety@anthropic.com> Signed-off-by: Adam Saleh <adam@asaleh.net>

ZAP failures land in failedTests as "dast.high/[HIGH] SQL Injection (alertRef=40018)". The old fallback split on `: /` and returned "(alertRef=40018)" — useless. Add a DAST-specific branch that strips the classname prefix and alertRef suffix, leaving "[HIGH] SQL Injection". Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The sidecar referenced /usr/local/bin/collect-logs-sidecar.sh which was not present in the overlay image, causing an immediate failure. DAST doesn't need live cluster-pod-log snapshots — rapidast writes its own output and collect-results handles the final artifact upload. Also remove the now-unused namespace param. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

AdamSaleh · 2026-06-24T07:57:57Z

/retest

ArgoCD's /api/v1/stream/* endpoints are Server-Sent Events that keep connections open indefinitely. ZAP's openapi job times out fetching them, marks the plan as failed, and skips the active scan and report generation entirely. Fix: download swagger.json from ArgoCD in run-dast, strip /stream/ paths with a one-liner Python filter, write to swagger-filtered.json, and pass it via apiFile instead of apiUrl. Also fix parse-dast-results.py find_zap_json: RapidAST writes reports to {results}/{shortName}/DAST-{date}-RapiDAST-{shortName}/zap/zap-report.json (two levels deep), but the old patterns only searched one level deep. Verified with a full local scan: 103 URLs, active scan completes, all reports extracted, parser finds the JSON at the correct path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

red-hat-konflux · 2026-06-24T09:57:45Z

Caution

There are some errors in your PipelineRun template.

PipelineRun	Error
tasks/test-dast.yaml	`yaml validation error: line 180: could not find expected ':'`

…error Multi-line Python at column 0 inside script: | terminates the YAML literal block scalar, causing "could not find expected ':'". Collapse to a single python3 -c line to stay within the block's indentation boundary. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

AdamSaleh · 2026-06-27T18:03:40Z

/retest

… surfacing render-results.py: - Add testScript as grouping dimension so each test type (sanity, sequential-s1/s2, parallel, rollouts, ui, dast) gets its own leaf README and matrix column - OCP-level README shows variant × test-type matrix; columns derived from all historical runs for the product+OCP so gaps show as "—" - Each OCP matrix cell links to its Konflux UI pipelinerun via logUrl - Version-level README shows per-test-type breakdown with p/f/s counts; each line links directly to its leaf README - Product-level README cells link to the OCP-level README - Summary levels (product, version, top) collapse across test types showing worst status per variant run-ui-e2e-tests.sh: - Synthesize a minimal JUnit failure when Playwright exits non-zero but writes no usable test output (e.g. global setup crash before any tests) - Copy JUnit to ${SHARED_DIR}/results/ so wrapup task can find it even if the ORAS artifact pull fails collect-and-upload-logs.sh: - Add ${SHARED_DIR}/results as fallback JUnit search path so UI auth failures are parsed and surfaced in the results summary Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

With set -euo pipefail, if grep finds no match in a command substitution (OC_TOKEN=$(curl...|grep...)), the pipe failure exits the script before the emptiness check can print a useful error. - Add || true to all grep token-extraction pipelines so the if [[ -z ... ]] checks actually fire with a clear message - Add --max-time 30 to all curl calls so a hung OAuth/API endpoint fails in 30s rather than hanging the step indefinitely - Split base64 -d onto its own line to keep each extraction step independently checkable Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

When the kubeadmin OAuth token request fails, log the first 20 lines of the response (status line + headers) so the root cause is visible in the pipeline logs — whether it is a 401 wrong credentials, a 302 redirect without the expected token fragment, a connection error, or a timeout. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The get-cluster-info step was reading `.data.kubeadmin` from the kube-system/kubeadmin secret, which contains the htpasswd hash, not the plain-text password. This caused OAuth authentication to fail on ephemeral clusters. Now the get-kubeconfig step fetches the admin password from the CTI's `.status.adminPassword.name` secret (matching what the eaas-get-ephemeral-cluster-credentials StepAction does), and get-cluster-info reads it from /credentials/*password — with a fallback to `.data.password` (the correct key) from kube-system. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Ephemeral clusters may not have their OAuth router fully available immediately after provisioning. Add a retry loop (5 attempts, 30s wait) that retries on HTTP 502/503 responses before failing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The OAuth router on EaaS clusters returns 503 when the ingress stack isn't ready, which may persist beyond any reasonable retry window. The OAuth token was only used to call the Kubernetes API to read the ArgoCD admin password. Move that `oc get secret` call into the get-cluster-info step (which already has oc + cert-based kubeconfig) and pass ARGO_PWD through cluster-info.env. The run-dast step now goes directly to the ArgoCD session API with no OAuth dependency. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

ZAP's JVM heap is 2048m but total RSS during active scan (native memory, thread stacks, site tree) can exceed 4GB, hitting the namespace LimitRange default and getting SIGKILL (-9). Set explicit 6Gi limit / 4Gi request so the step gets enough memory to complete the active scan. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The Tekton v1 API uses computeResources for step-level CPU/memory limits; the resources field is not recognized and causes task validation to fail. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

ArgoCD v2.14.1 CLI emits JSON log format ({"level":"fatal",...}) but the test assertions check for logrus text format (level=fatal). All three tests fail consistently across runs; the underlying CLI behavior is correct. Skip until upstream test is updated. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replace the duplicated inline check-gate taskSpec in all three pipelines with a shared StepAction (check-gate-labels.yaml). The GATE_LABEL param now accepts a comma-separated list of labels (e.g. "rc-sanity-check,channel-1.19") — ALL listed labels must be present on the PR for the pipeline to proceed. A single label value is still accepted unchanged for backward compatibility. Push events (no PR found) always proceed regardless of labels. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…19, 1.20, 1.21 Creates 15 scenarios per channel (45 total), gated on channel-X.XX label in addition to the existing test-type gate label. Each set covers: sanity, sanity-fips, sanity-upgrade, sanity-upgrade-fips, sequential-s1, sequential-s1-upgrade, sequential-s2, parallel, parallel-fips, parallel-upgrade, rollouts, ui-e2e, argocd-e2e, argocd-e2e-fips, dast. Channel versions: - 1.19: OPERATOR_CHANNEL=gitops-1.19, TEST_REPO_BRANCH=v1.19, ArgoCD=v3.1.16, upgrade from gitops-1.18 - 1.20: OPERATOR_CHANNEL=gitops-1.20, TEST_REPO_BRANCH=v1.20, ArgoCD=v3.3.12, upgrade from gitops-1.19 - 1.21: OPERATOR_CHANNEL=gitops-1.21, TEST_REPO_BRANCH=v1.21, ArgoCD=v3.4.4, upgrade from gitops-1.20 UI tests use master branch for all channels. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

StepAction results written via $(step.results.X.path) are stored at /tekton/steps/<name>/results/X but are NOT promoted to TaskRun .status.taskResults automatically. The pipeline's when expressions that read $(tasks.check-gate.results.proceed) therefore always saw an empty string, causing all gated tasks to be skipped. Add a propagate-result step to each check-gate taskSpec that copies /tekton/steps/check/results/proceed to $(results.proceed.path) so the result is visible to the pipeline. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

AdamSaleh · 2026-07-02T06:35:48Z

/retest

AdamSaleh · 2026-07-02T06:57:08Z

/retest

Tekton prefixes internal step names with "step-" when storing StepAction results, so the result written by the "check" step is at /tekton/steps/step-check/results/proceed, not /tekton/steps/check/results/proceed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ctions Shell script robustness: - run-e2e-tests.sh: set -exo pipefail; shallow-fetch only target branch - run-ui-e2e-tests.sh: set -euo pipefail; guard playwright install with || true - publish-results.sh: track push success, exit 1 if all 3 attempts fail - wait-for-resources.sh: remove unconditional 30s sleep before CSV poll loop - print-cluster-login-info.sh: redact kubeadmin password in log output - run-sanity-tests.sh: use json.dumps() for safe JSON generation - upgrade-operator.sh: set -euo pipefail; add NAMESPACE default - run-sequential-tests-shard{1,2}.sh: fix typo/extensions in focus list, remove no-op suite_test.go from shard 2 install-operator.sh hardening: - trap to always kill background pull-secret loop on exit - head -1 on DaemonSet grep to avoid multi-line DS_NAME - remove head -3 from registry verification (check all registries) Security fixes: - check-gate-labels.yaml: add github-token param + authenticated curl; fix IFS cleanup after break using while-read idiom - send-slack-message.py: replace shell=True + f-string with shell=False explicit argument lists throughout Scenario YAML: - gitops-channel-tests-{1-19,1-20,1-21}.yaml: point UI e2e TEST_REPO_BRANCH at konflux-integration-* QA fork branches instead of master - remove TEST_IMAGE_URL from all scenarios (param not declared in pipeline, silently dropped by Tekton) Python scripts: - collect-build-metadata.sh: single-quoted EOF heredoc + os.environ to prevent shell injection into Python triple-quoted strings - render-results.py: wrap clean+render in try/except with git checkout -- . recovery if rendering fails after directories are deleted External images and stepactions: - resolve-openshift-version.yaml: replace docker.io/python:3-alpine with registry.access.redhat.com/ubi9/python-311:latest - extract-image-content-sources.yaml: same image replacement; --depth=1 shallow clone; remove redundant apk add git ArgoCD e2e: - Add TODO/FIXME comments for bitnami/git pinning, missing JUnit XML output (needs go-junit-report in image), and v3.x pre-compilation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

All gitops-operator e2e test scenarios (sanity, sequential, parallel, rollouts, ui-e2e) now use: TEST_REPO_URL: https://github.com/rh-gitops-release-qa/gitops-operator TEST_REPO_BRANCH: konflux-integration-{1.19,1.20,1.21,latest} These QA fork branches carry downstream-specific fixes (OCP guided tour Playwright fix, argocd-agent image check relaxation) that are needed for tests to pass against the downstream operator. ArgoCD e2e and DAST scenarios are unchanged — they don't pull test code from the gitops-operator repo. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ripts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

With a single 'read' variable, IFS splitting does not distribute across the delimiter — the entire string lands in $required. Convert commas to newlines so each label becomes its own input line. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The konflux-integration-1.19 and konflux-integration-1.20 QA fork branches do not contain test/ui-e2e — the Playwright tests were only added starting from the 1.21 cycle. Running the ui-e2e scenario against those branches will always fail with "directory not found". Remove the scenarios until the tests are backported. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

AdamSaleh force-pushed the konflux-integration branch from 2e84399 to acd01fe Compare June 5, 2026 12:56

AdamSaleh added the rc-sanity-check label Jun 8, 2026

AdamSaleh added rc-operator-check rc-argocd-check rc-ui-check labels Jun 14, 2026

AdamSaleh added the build-test-image label Jun 16, 2026

AdamSaleh added the run-dast label Jun 20, 2026

AdamSaleh added 6 commits June 22, 2026 17:35

Added scenarios for release testing

084a223

Removed test-image build from the pipeline

571879d

Moving to upstream repo from my fork

051d964

Updated image

4dad7fc

Small fixes

493afd3

AdamSaleh and others added 2 commits June 23, 2026 16:17

AdamSaleh and others added 10 commits June 29, 2026 13:24

fix(dast): use computeResources instead of resources for Tekton v1 API

b73de17

The Tekton v1 API uses computeResources for step-level CPU/memory limits; the resources field is not recognized and causes task validation to fail. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

AdamSaleh added channel-1.19 channel-1.20 channel-1.21 labels Jul 1, 2026

AdamSaleh and others added 2 commits July 1, 2026 23:29

AdamSaleh and others added 6 commits July 2, 2026 09:09

docs: add integration tests README covering pipeline structure and sc…

d195f36

…ripts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Uh oh!

Conversation

AdamSaleh commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What this branch introduces

Pipeline structure

Test suites

QA fork and downstream branches

Scenarios

Log storage (Quay / ORAS)

Results dashboard

Code quality fixes (applied in this branch)

Documentation

Uh oh!

AdamSaleh commented Jun 1, 2026

Uh oh!

AdamSaleh commented Jun 1, 2026

Uh oh!

AdamSaleh commented Jun 1, 2026

Uh oh!

AdamSaleh commented Jun 5, 2026

Uh oh!

AdamSaleh commented Jun 8, 2026

Uh oh!

AdamSaleh commented Jun 8, 2026

Uh oh!

AdamSaleh commented Jun 12, 2026

Uh oh!

AdamSaleh commented Jun 12, 2026

Uh oh!

AdamSaleh commented Jun 14, 2026

Uh oh!

AdamSaleh commented Jun 14, 2026

Uh oh!

AdamSaleh commented Jun 16, 2026

Uh oh!

AdamSaleh commented Jun 17, 2026

Uh oh!

AdamSaleh commented Jun 17, 2026

Uh oh!

AdamSaleh commented Jun 18, 2026

Uh oh!

AdamSaleh commented Jun 19, 2026

Uh oh!

AdamSaleh commented Jun 20, 2026

Uh oh!

AdamSaleh commented Jun 21, 2026

Uh oh!

AdamSaleh commented Jun 24, 2026

Uh oh!

red-hat-konflux Bot commented Jun 24, 2026

Uh oh!

AdamSaleh commented Jun 27, 2026

Uh oh!

AdamSaleh commented Jul 2, 2026

Uh oh!

AdamSaleh commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

AdamSaleh commented May 25, 2026 •

edited

Loading