Konflux integration#145
Open
AdamSaleh wants to merge 57 commits into
Open
Conversation
Collaborator
Author
|
/retest |
2 similar comments
Collaborator
Author
|
/retest |
Collaborator
Author
|
/retest |
2e84399 to
acd01fe
Compare
Collaborator
Author
|
/retest |
Collaborator
Author
|
/retest |
4 similar comments
Collaborator
Author
|
/retest |
Collaborator
Author
|
/retest |
Collaborator
Author
|
/retest |
Collaborator
Author
|
/retest |
Collaborator
Author
|
/retest |
Collaborator
Author
|
/retest |
4 similar comments
Collaborator
Author
|
/retest |
Collaborator
Author
|
/retest |
Collaborator
Author
|
/retest |
Collaborator
Author
|
/retest |
Collaborator
Author
|
/retest |
1 similar comment
Collaborator
Author
|
/retest |
There are currently four test-suites being run: - gitops-operator's e2e ginkgo test-suite, sharded into 3 scripts - the rollouts e2e tests - gitops operator's ui test verifying login (more tests to come) - the argocd tests in a separate pipeline There is simple parametrized pipeline, where you can choose: - the openshift version - size of cluster nodes - the channel to be used in the catalog - the test-script to run Secont separate pipeline installs standalone argocd and runs the e2e tests All the tests are run from precompiled docker image, the pipeline will check at the start and build them if hte images were changed. The test and utility scripts always get copied. The logs get uploaded to quay. At the end of the pipeline, it will send a message to gitops-test-notification channel on slack The code is mostly authored by prompting claude and tested against the v1.20 branch of the catalog repo. Assisted-by: Claude <usersafety@anthropic.com> Signed-off-by: Adam Saleh <adam@asaleh.net>
ZAP failures land in failedTests as "dast.high/[HIGH] SQL Injection (alertRef=40018)". The old fallback split on `: /` and returned "(alertRef=40018)" — useless. Add a DAST-specific branch that strips the classname prefix and alertRef suffix, leaving "[HIGH] SQL Injection". Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The sidecar referenced /usr/local/bin/collect-logs-sidecar.sh which was not present in the overlay image, causing an immediate failure. DAST doesn't need live cluster-pod-log snapshots — rapidast writes its own output and collect-results handles the final artifact upload. Also remove the now-unused namespace param. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Collaborator
Author
|
/retest |
ArgoCD's /api/v1/stream/* endpoints are Server-Sent Events that keep
connections open indefinitely. ZAP's openapi job times out fetching
them, marks the plan as failed, and skips the active scan and report
generation entirely.
Fix: download swagger.json from ArgoCD in run-dast, strip /stream/
paths with a one-liner Python filter, write to swagger-filtered.json,
and pass it via apiFile instead of apiUrl.
Also fix parse-dast-results.py find_zap_json: RapidAST writes reports
to {results}/{shortName}/DAST-{date}-RapiDAST-{shortName}/zap/zap-report.json
(two levels deep), but the old patterns only searched one level deep.
Verified with a full local scan: 103 URLs, active scan completes,
all reports extracted, parser finds the JSON at the correct path.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Contributor
|
Caution There are some errors in your PipelineRun template.
|
…error Multi-line Python at column 0 inside script: | terminates the YAML literal block scalar, causing "could not find expected ':'". Collapse to a single python3 -c line to stay within the block's indentation boundary. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Collaborator
Author
|
/retest |
… surfacing
render-results.py:
- Add testScript as grouping dimension so each test type (sanity,
sequential-s1/s2, parallel, rollouts, ui, dast) gets its own leaf
README and matrix column
- OCP-level README shows variant × test-type matrix; columns derived
from all historical runs for the product+OCP so gaps show as "—"
- Each OCP matrix cell links to its Konflux UI pipelinerun via logUrl
- Version-level README shows per-test-type breakdown with p/f/s counts;
each line links directly to its leaf README
- Product-level README cells link to the OCP-level README
- Summary levels (product, version, top) collapse across test types
showing worst status per variant
run-ui-e2e-tests.sh:
- Synthesize a minimal JUnit failure when Playwright exits non-zero but
writes no usable test output (e.g. global setup crash before any tests)
- Copy JUnit to ${SHARED_DIR}/results/ so wrapup task can find it even
if the ORAS artifact pull fails
collect-and-upload-logs.sh:
- Add ${SHARED_DIR}/results as fallback JUnit search path so UI auth
failures are parsed and surfaced in the results summary
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
With set -euo pipefail, if grep finds no match in a command substitution (OC_TOKEN=$(curl...|grep...)), the pipe failure exits the script before the emptiness check can print a useful error. - Add || true to all grep token-extraction pipelines so the if [[ -z ... ]] checks actually fire with a clear message - Add --max-time 30 to all curl calls so a hung OAuth/API endpoint fails in 30s rather than hanging the step indefinitely - Split base64 -d onto its own line to keep each extraction step independently checkable Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When the kubeadmin OAuth token request fails, log the first 20 lines of the response (status line + headers) so the root cause is visible in the pipeline logs — whether it is a 401 wrong credentials, a 302 redirect without the expected token fragment, a connection error, or a timeout. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The get-cluster-info step was reading `.data.kubeadmin` from the kube-system/kubeadmin secret, which contains the htpasswd hash, not the plain-text password. This caused OAuth authentication to fail on ephemeral clusters. Now the get-kubeconfig step fetches the admin password from the CTI's `.status.adminPassword.name` secret (matching what the eaas-get-ephemeral-cluster-credentials StepAction does), and get-cluster-info reads it from /credentials/*password — with a fallback to `.data.password` (the correct key) from kube-system. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Ephemeral clusters may not have their OAuth router fully available immediately after provisioning. Add a retry loop (5 attempts, 30s wait) that retries on HTTP 502/503 responses before failing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The OAuth router on EaaS clusters returns 503 when the ingress stack isn't ready, which may persist beyond any reasonable retry window. The OAuth token was only used to call the Kubernetes API to read the ArgoCD admin password. Move that `oc get secret` call into the get-cluster-info step (which already has oc + cert-based kubeconfig) and pass ARGO_PWD through cluster-info.env. The run-dast step now goes directly to the ArgoCD session API with no OAuth dependency. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ZAP's JVM heap is 2048m but total RSS during active scan (native memory, thread stacks, site tree) can exceed 4GB, hitting the namespace LimitRange default and getting SIGKILL (-9). Set explicit 6Gi limit / 4Gi request so the step gets enough memory to complete the active scan. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The Tekton v1 API uses computeResources for step-level CPU/memory limits; the resources field is not recognized and causes task validation to fail. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ArgoCD v2.14.1 CLI emits JSON log format ({"level":"fatal",...}) but
the test assertions check for logrus text format (level=fatal). All
three tests fail consistently across runs; the underlying CLI behavior
is correct. Skip until upstream test is updated.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the duplicated inline check-gate taskSpec in all three pipelines with a shared StepAction (check-gate-labels.yaml). The GATE_LABEL param now accepts a comma-separated list of labels (e.g. "rc-sanity-check,channel-1.19") — ALL listed labels must be present on the PR for the pipeline to proceed. A single label value is still accepted unchanged for backward compatibility. Push events (no PR found) always proceed regardless of labels. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…19, 1.20, 1.21 Creates 15 scenarios per channel (45 total), gated on channel-X.XX label in addition to the existing test-type gate label. Each set covers: sanity, sanity-fips, sanity-upgrade, sanity-upgrade-fips, sequential-s1, sequential-s1-upgrade, sequential-s2, parallel, parallel-fips, parallel-upgrade, rollouts, ui-e2e, argocd-e2e, argocd-e2e-fips, dast. Channel versions: - 1.19: OPERATOR_CHANNEL=gitops-1.19, TEST_REPO_BRANCH=v1.19, ArgoCD=v3.1.16, upgrade from gitops-1.18 - 1.20: OPERATOR_CHANNEL=gitops-1.20, TEST_REPO_BRANCH=v1.20, ArgoCD=v3.3.12, upgrade from gitops-1.19 - 1.21: OPERATOR_CHANNEL=gitops-1.21, TEST_REPO_BRANCH=v1.21, ArgoCD=v3.4.4, upgrade from gitops-1.20 UI tests use master branch for all channels. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
StepAction results written via $(step.results.X.path) are stored at /tekton/steps/<name>/results/X but are NOT promoted to TaskRun .status.taskResults automatically. The pipeline's when expressions that read $(tasks.check-gate.results.proceed) therefore always saw an empty string, causing all gated tasks to be skipped. Add a propagate-result step to each check-gate taskSpec that copies /tekton/steps/check/results/proceed to $(results.proceed.path) so the result is visible to the pipeline. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Collaborator
Author
|
/retest |
1 similar comment
Collaborator
Author
|
/retest |
Tekton prefixes internal step names with "step-" when storing StepAction results, so the result written by the "check" step is at /tekton/steps/step-check/results/proceed, not /tekton/steps/check/results/proceed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ctions
Shell script robustness:
- run-e2e-tests.sh: set -exo pipefail; shallow-fetch only target branch
- run-ui-e2e-tests.sh: set -euo pipefail; guard playwright install with || true
- publish-results.sh: track push success, exit 1 if all 3 attempts fail
- wait-for-resources.sh: remove unconditional 30s sleep before CSV poll loop
- print-cluster-login-info.sh: redact kubeadmin password in log output
- run-sanity-tests.sh: use json.dumps() for safe JSON generation
- upgrade-operator.sh: set -euo pipefail; add NAMESPACE default
- run-sequential-tests-shard{1,2}.sh: fix typo/extensions in focus list,
remove no-op suite_test.go from shard 2
install-operator.sh hardening:
- trap to always kill background pull-secret loop on exit
- head -1 on DaemonSet grep to avoid multi-line DS_NAME
- remove head -3 from registry verification (check all registries)
Security fixes:
- check-gate-labels.yaml: add github-token param + authenticated curl;
fix IFS cleanup after break using while-read idiom
- send-slack-message.py: replace shell=True + f-string with shell=False
explicit argument lists throughout
Scenario YAML:
- gitops-channel-tests-{1-19,1-20,1-21}.yaml: point UI e2e TEST_REPO_BRANCH
at konflux-integration-* QA fork branches instead of master
- remove TEST_IMAGE_URL from all scenarios (param not declared in pipeline,
silently dropped by Tekton)
Python scripts:
- collect-build-metadata.sh: single-quoted EOF heredoc + os.environ to
prevent shell injection into Python triple-quoted strings
- render-results.py: wrap clean+render in try/except with git checkout -- .
recovery if rendering fails after directories are deleted
External images and stepactions:
- resolve-openshift-version.yaml: replace docker.io/python:3-alpine with
registry.access.redhat.com/ubi9/python-311:latest
- extract-image-content-sources.yaml: same image replacement; --depth=1
shallow clone; remove redundant apk add git
ArgoCD e2e:
- Add TODO/FIXME comments for bitnami/git pinning, missing JUnit XML
output (needs go-junit-report in image), and v3.x pre-compilation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
All gitops-operator e2e test scenarios (sanity, sequential, parallel, rollouts, ui-e2e) now use: TEST_REPO_URL: https://github.com/rh-gitops-release-qa/gitops-operator TEST_REPO_BRANCH: konflux-integration-{1.19,1.20,1.21,latest} These QA fork branches carry downstream-specific fixes (OCP guided tour Playwright fix, argocd-agent image check relaxation) that are needed for tests to pass against the downstream operator. ArgoCD e2e and DAST scenarios are unchanged — they don't pull test code from the gitops-operator repo. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ripts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
With a single 'read' variable, IFS splitting does not distribute across the delimiter — the entire string lands in $required. Convert commas to newlines so each label becomes its own input line. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The konflux-integration-1.19 and konflux-integration-1.20 QA fork branches do not contain test/ui-e2e — the Playwright tests were only added starting from the 1.21 cycle. Running the ui-e2e scenario against those branches will always fail with "directory not found". Remove the scenarios until the tests are backported. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Comprehensive integration test infrastructure for the GitOps operator in Konflux CI. The pipeline provisions an ephemeral HyperShift (EaaS) cluster on every run, installs the operator under test via OLM from the FBC catalog image, executes Ginkgo test suites from a QA fork, and pushes structured results to Quay and a results dashboard repo.
What this branch introduces
Pipeline structure
rc-sanity-check,rc-operator-check,rc-ui-check, etc.)pipeline-wrapupalways runs regardless of test outcomeTest suites
run-sanity-tests.shrun-sequential-tests-shard1.shrun-sequential-tests-shard2.shrun-parallel-tests.shrun-rollouts-tests.shrun-ui-e2e-tests.shrun-argocd-e2e-tests.shQA fork and downstream branches
Tests run from rh-gitops-release-qa/gitops-operator — a fork carrying downstream-specific patches (relaxed image assertions for
registry.redhat.ioimages, OCP guided tour dismissal in Playwright). One branch per channel:konflux-integration-latestkonflux-integration-1.21konflux-integration-1.20konflux-integration-1.19Scenarios
gitops-operator-tests.yaml/gitops-sanity-tests.yaml/gitops-ui-tests.yaml— latest channelgitops-channel-tests-1-{21,20,19}.yaml— one file per supported channel, each containing sanity, upgrade-sanity, sequential (2 shards), parallel, rollouts, UI e2e, ArgoCD e2e, and DAST scenariosgitops-argocd-tests.yaml/gitops-dast.yaml— standalone upstream ArgoCD and DASTLog storage (Quay / ORAS)
Each task uploads logs incrementally as OCI artifacts during the run:
pipeline-wrapuppulls all per-task artifacts, merges with cluster-level logs, and pushes a combined bundle:Artifacts expire after 7 days.
Results dashboard
publish-results.sh+render-results.pywrite structured JUnit summaries to rh-gitops-midstream/catalog-results. Each run appends a JSONL record undergitops-operator/<version>/ocp-<ver>/results.jsonland the README table is re-rendered.Code quality fixes (applied in this branch)
The following issues from a thorough code review were resolved before this PR:
set -exo pipefail;git fetchscoped to the target branch with--depth=1send-slack-message.pyrewritten to useshell=Falsewith explicit argument lists;collect-build-metadata.shheredoc switched to single-quoted'EOF'with env var passingcheck-gate-labels.yamlnow accepts agithub-tokenparam to avoid silent bypass when GitHub API rate-limits unauthenticated shared-IP callsinstall-operator.shbackground SA patch loop now hastrap ... EXITcleanuppublish-results.shretry loop rewritten to exit 1 if all 3 attempts failresolve-openshift-version.yamlandextract-image-content-sources.yamlmoved fromdocker.io/python:3-alpinetoregistry.access.redhat.com/ubi9/python-311render-results.pywraps the delete+render cycle intry/exceptwithgit checkout -- .rollback on failurerun-sequential-tests-shard1.shfocus list corrected (valiate→validate, missing_test.gosuffixes)install-operator.sh| head -3removed so all injected registries are verified, not just the first threeextract-image-content-sources.yamluses--depth=1; handles both SHAs and branch namesDocumentation
See
.tekton/integration-tests/README.mdfor a full description of the pipeline structure, image layers, when to rebuild the base image, all scripts, Quay log storage, and the catalog-results repo.