Skip to content

Feat: RunOnInfra to use openshift.io/node-selector annotation on namespace#1194

Open
anandrkskd wants to merge 3 commits into
redhat-developer:masterfrom
anandrkskd:infra-annotation-support
Open

Feat: RunOnInfra to use openshift.io/node-selector annotation on namespace#1194
anandrkskd wants to merge 3 commits into
redhat-developer:masterfrom
anandrkskd:infra-annotation-support

Conversation

@anandrkskd

Copy link
Copy Markdown
Contributor

What type of PR is this?
/kind enhancement

What does this PR do / why we need it:
This PR changes RunOnInfra field implementation it adds openshift.io/node-selector="node-role.kubernetes.io/infra=“ annotation on namespace and let openshift make sure the pods are scheduled on infra nodes. Previous implementation used to add node-role label to resources directly.

Have you updated the necessary documentation?

  • Documentation update is required by this PR.
  • Documentation has been updated.

Which issue(s) this PR fixes:

Fixes GITOPS-9926

Test acceptance criteria:

  • Unit Test
  • E2E Test

How to test changes / Special notes to the reviewer:

  • run make install run
  • Update the default GitopsService CRD with spec.runOnInfra: true
  • make sure the nodes are properly labeled
kubectl label node/<node-name> node-role.kubernetes.io/infra=""
  • check if the pods are scheduled on the right Nodes.
  • check if the namespace contains annotations openshift.io/node-selector="node-role.kubernetes.io/infra=“

…pod infra NodeSelector with openshift.io/node-selector annotation on default namespace. OpenShift admission controller applies selector to all pods in namespace automatically. Tolerations and custom NodeSelector remain at pod level.

assited-by: claude-code
Signed-off-by: Anand Kumar Singh <anandrkskd@gmail.com>
Signed-off-by: Anand Kumar Singh <anandrkskd@gmail.com>
@openshift-ci

openshift-ci Bot commented Jun 29, 2026

Copy link
Copy Markdown

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci

openshift-ci Bot commented Jun 29, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign adamsaleh for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai

coderabbitai Bot commented Jun 29, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 997481ce-a108-4307-8f59-7778b4f36c60

📥 Commits

Reviewing files that changed from the base of the PR and between 9ef0c49 and d85bfe9.

📒 Files selected for processing (1)
  • controllers/gitopsservice_controller.go
🔗 Linked repositories identified

CodeRabbit considers these linked repositories for cross-repo context during reviews:

  • argoproj-labs/argocd-operator (manual)
🚧 Files skipped from review as they are similar to previous changes (1)
  • controllers/gitopsservice_controller.go

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Infrastructure scheduling is now driven by namespace annotations, with new infra annotation key/value exposed for verification.
    • End-to-end tests now assert infra behavior using namespace annotation helpers.
  • Bug Fixes

    • Infra placement no longer relies on pod-level node selectors; workloads retain the default node selector.
    • Reconciliation now consistently adds/removes the infra namespace annotation for both default and managed setups.
    • Resource creation flow now tolerates “already exists” cases and continues gracefully.
  • Tests

    • Updated unit and E2E expectations for namespace annotation, tolerations, and removal behavior.

Walkthrough

The PR changes how RunOnInfra is handled: namespace-level OpenShift annotations now drive infra scheduling, while pod-level infra node selector mutations are removed. Controller, unit test, and E2E assertions were updated to use namespace annotations and tolerations.

Changes

Infra Node Selector via Namespace Annotation

Layer / File(s) Summary
New infra annotation constants
common/common.go
Adds InfraNodeSelectorAnnotation and InfraNodeSelectorAnnotationValue exported constants.
Controller: annotation helper and namespace reconciliation
controllers/gitopsservice_controller.go
Adds ensureInfraNodeSelectorAnnotation, applies it to backend and default Argo CD namespaces on create and update paths, removes prior pod and NodePlacement infra mutations, and treats AlreadyExists as non-fatal in SetupWithManager.
ConsolePlugin: remove infra pod selector
controllers/consoleplugin.go, controllers/consoleplugin_test.go
Removes conditional clearing of infra NodeSelector from reconcileDeployment; the pod NodeSelector stays on DefaultNodeSelector(), and the test assertion was updated.
Unit tests: annotation-based assertions
controllers/gitopsservice_controller_test.go
Updates TestReconcile_InfrastructureNode to assert namespace annotation presence and default pod selector, and adds TestReconcile_InfrastructureNode_AnnotationRemoved for toggle-off behavior.
Namespace fixture matchers
test/openshift/e2e/ginkgo/fixture/namespace/fixture.go
Adds HaveAnnotation and NotHaveAnnotation Gomega matchers for namespace annotation assertions.
E2E tests: annotation-based infra verification
test/e2e/gitopsservice_test.go, test/openshift/e2e/ginkgo/sequential/1-028-*.go, test/openshift/e2e/ginkgo/sequential/1-071_*.go
Replaces pod-level infra selector checks with namespace annotation assertions and updated toleration checks across the E2E suite.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly states the key change: RunOnInfra now uses the namespace node-selector annotation.
Description check ✅ Passed The description matches the PR by explaining the namespace annotation change, its purpose, and the related test notes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@anandrkskd anandrkskd marked this pull request as ready for review June 30, 2026 10:06
@openshift-ci openshift-ci Bot requested review from keithchong and svghadi June 30, 2026 10:06

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
test/openshift/e2e/ginkgo/sequential/1-071_validate_node_selectors_test.go (1)

137-153: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Assert the fallback nodeSelector explicitly after cleanup.

ShouldNot(HaveTemplateSpecNodeSelector({"kubernetes.io/os":"linux","key1":"value1"})) only proves key1 is gone. It also passes if the controller drops the default kubernetes.io/os=linux selector entirely. Please assert the final selector equals the default map instead of only asserting inequality.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/openshift/e2e/ginkgo/sequential/1-071_validate_node_selectors_test.go`
around lines 137 - 153, The cleanup assertions in the node selector test are too
weak because ShouldNot(HaveTemplateSpecNodeSelector(...)) still passes if the
default selector is removed entirely. Update the assertions in the loop over
deploymentNameList and the appControllerSS check to verify the final template
spec node selector matches the expected fallback/default selector exactly, using
the existing HaveTemplateSpecNodeSelector helper on the Deployment and
StatefulSet fixtures.

Source: Path instructions

🧹 Nitpick comments (1)
controllers/gitopsservice_controller_test.go (1)

643-650: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add a preservation case for custom spec.nodeSelector.

The PR contract says RunOnInfra should stop injecting the infra selector without dropping user-provided node selectors, but this test only covers the empty-selector path. A regression that clears GitopsServiceSpec.NodeSelector would still pass here. Please add a case with Spec.NodeSelector populated and assert that the backend Deployment and ArgoCD.Spec.NodePlacement.NodeSelector keep that map while the namespace annotation is still set.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@controllers/gitopsservice_controller_test.go` around lines 643 - 650, Add a
preservation test for custom GitopsServiceSpec.NodeSelector in the RunOnInfra
path, since the current gitopsservice_controller_test only verifies the
empty-selector case. Update the relevant test setup around the GitopsService
reconciliation and the ArgoCD assertions so a populated NodeSelector is
provided, then verify the backend Deployment and
ArgoCD.Spec.NodePlacement.NodeSelector both retain that map while the namespace
annotation still reflects infra placement. Use the existing GitopsService,
ArgoCD, and deployment checks to ensure user-provided selectors are not cleared.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@controllers/gitopsservice_controller.go`:
- Around line 100-106: The GitOpsService bootstrap in the controller’s setup
path only logs failures from r.Client.Create and then continues, which lets
SetupWithManager succeed even when creation actually failed. Update the
bootstrap flow around the Create call in the GitopsService controller to return
the non-AlreadyExists error to the caller, while still treating
errors.IsAlreadyExists as a non-fatal case. Make sure the setup method that
invokes this creation propagates the error so operator startup fails fast on
RBAC/API issues instead of leaving the singleton uncreated.
- Around line 1029-1045: The ensureInfraNodeSelectorAnnotation helper is
overwriting or deleting the entire openshift.io/node-selector annotation instead
of preserving any existing selector terms. Update this function so it only adds
the infra selector when runOnInfra is true and removes just that infra clause
when false, leaving any other node-selector constraints intact; use
ensureInfraNodeSelectorAnnotation and
common.InfraNodeSelectorAnnotation/common.InfraNodeSelectorAnnotationValue to
locate the logic.

In `@test/openshift/e2e/ginkgo/sequential/1-028-validate_run_on_infra_test.go`:
- Around line 144-153: The disable-path assertion in the OpenShift GitOps e2e
test is too narrow because it only checks the namespace annotation and the
cluster deployment. Update the verification in the same test block that uses
openshiftGitopsNS and clusterDepl so it asserts the full openshift-gitops
workload set, matching the enable-path coverage by checking all relevant
deployments/workloads no longer have the infra toleration after RunOnInfra is
disabled.

---

Outside diff comments:
In `@test/openshift/e2e/ginkgo/sequential/1-071_validate_node_selectors_test.go`:
- Around line 137-153: The cleanup assertions in the node selector test are too
weak because ShouldNot(HaveTemplateSpecNodeSelector(...)) still passes if the
default selector is removed entirely. Update the assertions in the loop over
deploymentNameList and the appControllerSS check to verify the final template
spec node selector matches the expected fallback/default selector exactly, using
the existing HaveTemplateSpecNodeSelector helper on the Deployment and
StatefulSet fixtures.

---

Nitpick comments:
In `@controllers/gitopsservice_controller_test.go`:
- Around line 643-650: Add a preservation test for custom
GitopsServiceSpec.NodeSelector in the RunOnInfra path, since the current
gitopsservice_controller_test only verifies the empty-selector case. Update the
relevant test setup around the GitopsService reconciliation and the ArgoCD
assertions so a populated NodeSelector is provided, then verify the backend
Deployment and ArgoCD.Spec.NodePlacement.NodeSelector both retain that map while
the namespace annotation still reflects infra placement. Use the existing
GitopsService, ArgoCD, and deployment checks to ensure user-provided selectors
are not cleared.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 7fa859f0-e8d7-4ffd-a261-4e3fd2dd3cfc

📥 Commits

Reviewing files that changed from the base of the PR and between 93a8216 and 9ef0c49.

📒 Files selected for processing (9)
  • common/common.go
  • controllers/consoleplugin.go
  • controllers/consoleplugin_test.go
  • controllers/gitopsservice_controller.go
  • controllers/gitopsservice_controller_test.go
  • test/e2e/gitopsservice_test.go
  • test/openshift/e2e/ginkgo/fixture/namespace/fixture.go
  • test/openshift/e2e/ginkgo/sequential/1-028-validate_run_on_infra_test.go
  • test/openshift/e2e/ginkgo/sequential/1-071_validate_node_selectors_test.go
🔗 Linked repositories identified

CodeRabbit considers these linked repositories for cross-repo context during reviews:

  • argoproj-labs/argocd-operator (manual)
💤 Files with no reviewable changes (1)
  • controllers/consoleplugin.go

Comment on lines 100 to +106
err := r.Client.Create(context.TODO(), gitopsServiceRef)
if err != nil {
reqLogger.Error(err, "Failed to create GitOps service instance")
if errors.IsAlreadyExists(err) {
reqLogger.Info("GitOps service instance already exists, skipping creation")
} else {
reqLogger.Error(err, "Failed to create GitOps service instance")
}

@coderabbitai coderabbitai Bot Jun 30, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Return non-AlreadyExists bootstrap errors.

Lines 100-106 only log Create failures and still let SetupWithManager succeed. If this call fails for RBAC/API reasons, the singleton GitopsService is never created, and later reconciles just exit on NotFound, leaving the operator partially initialized.

Suggested fix
  err := r.Client.Create(context.TODO(), gitopsServiceRef)
  if err != nil {
      if errors.IsAlreadyExists(err) {
          reqLogger.Info("GitOps service instance already exists, skipping creation")
      } else {
          reqLogger.Error(err, "Failed to create GitOps service instance")
+         return err
      }
  }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
err := r.Client.Create(context.TODO(), gitopsServiceRef)
if err != nil {
reqLogger.Error(err, "Failed to create GitOps service instance")
if errors.IsAlreadyExists(err) {
reqLogger.Info("GitOps service instance already exists, skipping creation")
} else {
reqLogger.Error(err, "Failed to create GitOps service instance")
}
err := r.Client.Create(context.TODO(), gitopsServiceRef)
if err != nil {
if errors.IsAlreadyExists(err) {
reqLogger.Info("GitOps service instance already exists, skipping creation")
} else {
reqLogger.Error(err, "Failed to create GitOps service instance")
return err
}
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@controllers/gitopsservice_controller.go` around lines 100 - 106, The
GitOpsService bootstrap in the controller’s setup path only logs failures from
r.Client.Create and then continues, which lets SetupWithManager succeed even
when creation actually failed. Update the bootstrap flow around the Create call
in the GitopsService controller to return the non-AlreadyExists error to the
caller, while still treating errors.IsAlreadyExists as a non-fatal case. Make
sure the setup method that invokes this creation propagates the error so
operator startup fails fast on RBAC/API issues instead of leaving the singleton
uncreated.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change only adds more defined logs for users, previous behaviour should handle it.

Comment thread controllers/gitopsservice_controller.go
Comment on lines +144 to 153
By("verifying that the openshift-gitops namespace no longer has the infra annotation")
Eventually(openshiftGitopsNS).Should(namespaceFixture.NotHaveAnnotation(common.InfraNodeSelectorAnnotation))

clusterDepl = &appsv1.Deployment{ObjectMeta: metav1.ObjectMeta{Name: "cluster", Namespace: "openshift-gitops"}}
Eventually(clusterDepl).ShouldNot(deploymentFixture.HaveTemplateSpecNodeSelector(map[string]string{
"node-role.kubernetes.io/infra": "",
"kubernetes.io/os": "linux",
}))
By("verifying that the resources in openshift-gitops no longer have run on infra tolerations set")

clusterDepl = &appsv1.Deployment{ObjectMeta: metav1.ObjectMeta{Name: "cluster", Namespace: "openshift-gitops"}}
Eventually(clusterDepl).ShouldNot(deploymentFixture.HaveTolerations([]corev1.Toleration{
{Key: "infra", Effect: "NoSchedule", Value: "reserved"}}),
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

The disable-path check is now too narrow.

After RunOnInfra is turned off, this only verifies annotation removal and the cluster deployment. If the other openshift-gitops workloads keep the infra toleration, the test still passes. Please assert the full workload set here, the same way this test does on the enable path.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/openshift/e2e/ginkgo/sequential/1-028-validate_run_on_infra_test.go`
around lines 144 - 153, The disable-path assertion in the OpenShift GitOps e2e
test is too narrow because it only checks the namespace annotation and the
cluster deployment. Update the verification in the same test block that uses
openshiftGitopsNS and clusterDepl so it asserts the full openshift-gitops
workload set, matching the enable-path coverage by checking all relevant
deployments/workloads no longer have the infra toleration after RunOnInfra is
disabled.

Source: Path instructions

…o/infra='

Signed-off-by: Anand Kumar Singh <anandrkskd@gmail.com>
labelUpdate, argocdNS := ensurePodSecurityLabels(argocdNS)
needsUpdate = needsUpdate || labelUpdate
annotationUpdate, argocdNS := ensureInfraNodeSelectorAnnotation(argocdNS, instance.Spec.RunOnInfra)
needsUpdate = needsUpdate || annotationUpdate

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same logic as 266-270, can we make it as function ?

@openshift-ci

openshift-ci Bot commented Jun 30, 2026

Copy link
Copy Markdown

@anandrkskd: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/v4.19-kuttl-sequential d85bfe9 link true /test v4.19-kuttl-sequential
ci/prow/v4.14-kuttl-sequential d85bfe9 link false /test v4.14-kuttl-sequential
ci/prow/v4.14-e2e d85bfe9 link false /test v4.14-e2e
ci/prow/v4.14-kuttl-parallel d85bfe9 link false /test v4.14-kuttl-parallel

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants