Skip to content

[FR-126] Rollback#274

Open
eniko-dif wants to merge 7 commits into
temporalio:mainfrom
eniko-dif:eniko/feature-rollback
Open

[FR-126] Rollback#274
eniko-dif wants to merge 7 commits into
temporalio:mainfrom
eniko-dif:eniko/feature-rollback

Conversation

@eniko-dif

@eniko-dif eniko-dif commented Apr 14, 2026

Copy link
Copy Markdown

What was changed

Add a separate rollback config to the deployment specification that can be used to have different rollback behavior from the rollout one. It supports AllAtOnce or Progressive and does not have a gate. Uses the LastCurrentTime of a deployment to decide if the target version supposed to be rolled out or rolled back.

Why?

Open feature issue: #126

Checklist

  1. Closes [Feature Request] Rollback mode #126

  2. How was this tested:

Unit and integration tests (ongoing actual test with local setup).

  1. Any docs updates needed?

Yes, but needs to be decided where.

@eniko-dif eniko-dif requested review from a team and jlegrone as code owners April 14, 2026 15:35
@CLAassistant

CLAassistant commented Apr 14, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

Comment thread api/v1alpha1/workerdeployment_types.go
Comment thread api/v1alpha1/worker_types.go Outdated
Comment thread api/v1alpha1/worker_types.go Outdated
Comment thread internal/planner/planner.go Outdated
Comment thread api/v1alpha1/workerdeployment_types.go
Comment thread api/v1alpha1/workerdeployment_webhook.go Outdated
Comment thread api/v1alpha1/workerdeployment_types.go
Comment thread internal/planner/planner.go Outdated
}

if config.RollbackStrategy.MaxVersionAge != nil && time.Since(*targetVersionInfo.LastCurrentTime) > config.RollbackStrategy.MaxVersionAge.Duration {
l.Info("Skipping rollback: the version's last current time exceeds MaxVersionAge",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
l.Info("Skipping rollback: the version's last current time exceeds MaxVersionAge",
l.Debug("Skipping rollback: the version's last current time exceeds MaxVersionAge",

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logger has no Debug level, only Info and Error. (If I remember correctly, it was related to K8s events?)

Comment thread internal/planner/planner.go Outdated
Comment thread api/v1alpha1/temporalworker_webhook.go Outdated
}

allErrs = append(allErrs, validateRolloutStrategy(new.Spec.RolloutStrategy)...)
allErrs = append(allErrs, validateRollbackStrategy(*new.Spec.RollbackStrategy)...)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could new.Spec.RollbackStrategy ever be nil? Should probably add a nil check here or update the type signature for validateRollbackStrategy instead of dereferencing.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rollback strategy is a pointer, because it's not mandatory in the spec. At the point of the validation it should not be nil, because of the default we set, but just in case, I added a nil check.

Comment thread api/v1alpha1/workerdeployment_types.go Outdated
Comment on lines +60 to +64
if s.RollbackStrategy == nil {
s.RollbackStrategy = &RollbackStrategy{Strategy: RollbackAllAtOnce}
} else if s.RollbackStrategy.Strategy == "" {
s.RollbackStrategy.Strategy = RollbackAllAtOnce
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this is entirely a matter of taste but I think two if statements is easier to read than the if/else flow (more clear that the strategy is always all at once if left unset).

Suggested change
if s.RollbackStrategy == nil {
s.RollbackStrategy = &RollbackStrategy{Strategy: RollbackAllAtOnce}
} else if s.RollbackStrategy.Strategy == "" {
s.RollbackStrategy.Strategy = RollbackAllAtOnce
}
if s.RollbackStrategy == nil {
s.RollbackStrategy = &RollbackStrategy{}
}
if s.RollbackStrategy.Strategy == "" {
s.RollbackStrategy.Strategy = RollbackAllAtOnce
}

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done the change, but I remember the times, where I had to optimize even the number of if calls... :P

Comment thread internal/planner/planner.go Outdated
Comment on lines +902 to +903
default:
strategy = temporaliov1alpha1.UpdateAllAtOnce

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I think this ought to be an error condition; if the controller sees an unknown strategy it should fall back to manual mode and stop attempting updates to the default version.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this function is used, there is a check to see if the rollout strategy is set to manual, and if it is, rollback is disabled. In any case, I updated the default to be manual.

},
}

testCases := []struct {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there should be a test cases for

1.RollbackStrategy == nil (even though the webhook should set a default, it's still possible to run the controller without the webhook enabled). When nil I would assert that the behavior matches the default rollback strategy configured by the webhook.
2. RollbackStrategy.MaxVersionAge == nil (assert default max version age applied, again to match the default webhook behavior)
3. RollbackStrategy.MaxVersionAge == 0 (assert false, even when the rest of the current status would normally have resulted in a rollback)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the Config to have rollback strategy at all times, so the first two are no longer needed and the third existed already.

@@ -23,8 +24,9 @@ func TestTemporalWorkerDeployment_ValidateCreate(t *testing.T) {
tests := map[string]struct {
obj runtime.Object
errorMsg string
warnMsg string

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a warning emitted for a combination of Manual rollout + Progressive rollback? That seems in the same vein as rollout steps being faster than rollback steps. Please add a test case for that if so.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When rollback is Manual, rollback is disabled in the planner, so I don't think we need a warning.

@eniko-dif eniko-dif force-pushed the eniko/feature-rollback branch from 371578c to a5d1e33 Compare May 27, 2026 15:52
@eniko-dif eniko-dif force-pushed the eniko/feature-rollback branch from a5d1e33 to 4b0bd82 Compare June 10, 2026 15:41
@eniko-dif eniko-dif requested a review from jlegrone June 10, 2026 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] Rollback mode

3 participants