Skip to content

Add instance restart policy#233

Open
sjmiller609 wants to merge 21 commits into
hypeship/add-healthcheck-policyfrom
hypeship/restart-policy
Open

Add instance restart policy#233
sjmiller609 wants to merge 21 commits into
hypeship/add-healthcheck-policyfrom
hypeship/restart-policy

Conversation

@sjmiller609
Copy link
Copy Markdown
Collaborator

@sjmiller609 sjmiller609 commented May 16, 2026

Summary

  • add restart_policy and restart_status API fields with generated bindings
  • add a restart-policy controller for whole-instance restarts with backoff, max attempts, stable reset, and manual-stop suppression
  • stack on the healthcheck branch and restart unhealthy running instances through restart policy when policy is on_failure or always
  • document the healthcheck/restart-policy interaction

Testing

  • go test ./lib/restart-policy ./lib/healthcheck ./lib/providers
  • go test -tags containers_image_openpgp ./lib/instances -run 'Test(ValidateCreateRequestHealthCheck|ValidateUpdateInstanceRequestAllowsRestartPolicyOnly|NormalizeRestartPolicyWrapsInvalidRequest|RestartStatusAfterPolicyUpdatePreservesManualStop|RestartStatusAfterPolicyUpdateClearsRetryState)$'
  • go test -tags containers_image_openpgp ./cmd/api/api -run 'Test(CreateInstance_MapsHealthCheckPolicy|CreateInstance_MapsRestartPolicy|UpdateInstance_MapsHealthCheckPatch|UpdateInstance_MapsRestartPolicyPatch|UpdateInstance_RejectsInvalidRestartPolicy)$'
  • go test -tags containers_image_openpgp ./cmd/api -run 'Test(StartImageRetentionControllerSkipsNilController|StartImageRetentionControllerStartsRunner|ConfigureOCICacheGCSkipsDisabled|ConfigureOCICacheGCRejectsInvalidInterval)$'

Not run: full ./cmd/api/api package; it enters broader lifecycle coverage outside this focused change.


Note

Medium Risk
Adds new instance lifecycle behavior (automatic stop/start restarts) driven by both guest exits and health-check failures, which can affect availability and state transitions if misconfigured. Risk is mitigated by backoff/max-attempt limits, manual-stop suppression, and extensive unit/integration test coverage.

Overview
Adds whole-instance restart policy support end-to-end: new restart_policy input on create/update and new restart_policy/restart_status fields on instance responses, with OpenAPI + generated bindings updated accordingly.

Introduces a lib/restart-policy module (normalization, backoff, max attempts, stable-window reset, and status tracking) and a new restart-policy controller in lib/instances that reconciles instances and performs automatic restarts from Stopped, while also handling Running instances becoming unhealthy by stop/start cycling.

Updates lifecycle semantics to suppress auto-restarts after manual Stop (exposed as restart_status.blocked_reason=manual_stop) and to clear restart status on Start; restart policy/state is cloned/reset appropriately on fork/snapshot. Adds tests across API mapping, controller behavior, metrics labels, and an integration path that triggers restarts via failing health checks, plus documentation clarifying health-check vs restart-policy responsibilities.

Reviewed by Cursor Bugbot for commit 4f6aa80. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 16, 2026

✱ Stainless preview builds for hypeman

This PR will update the hypeman SDKs with the following commit message.

feat: Add instance restart policy

Edit this comment to update it. It will appear in the SDK's changelogs.

hypeman-openapi studio · code · diff

Your SDK build had at least one "note" diagnostic, but this did not represent a regression.
generate ✅

hypeman-typescript studio · code · diff

Your SDK build had at least one "note" diagnostic, but this did not represent a regression.
generate ✅build ✅lint ❗test ✅

npm install https://pkg.stainless.com/s/hypeman-typescript/1d8b40dc5cb13d1cb49e627c9e698fa1f87c404e/dist.tar.gz
hypeman-go studio · code · diff

Your SDK build had at least one new note diagnostic, which is a regression from the base state.
generate ✅build ✅lint ✅test ✅

go get github.com/stainless-sdks/hypeman-go@847f011ae8336e7884baf433601b6c6e18fc9be8
New diagnostics (1 note)
💡 Schema/EnumHasOneMember: Confirm intentional use of `enum` with single member.

This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push.
If you push custom code to the preview branch, re-run this workflow to update the comment.
Last updated: 2026-05-17 19:36:42 UTC

@sjmiller609 sjmiller609 force-pushed the hypeship/restart-policy branch from 383ad8b to 365fa7d Compare May 16, 2026 03:00
@sjmiller609 sjmiller609 changed the base branch from main to hypeship/add-healthcheck-policy May 16, 2026 03:01
@sjmiller609 sjmiller609 marked this pull request as ready for review May 17, 2026 17:35
@firetiger-agent
Copy link
Copy Markdown

Monitoring Plan: Restart Policy

This PR introduces a new restart_policy feature on Hypeman instances along with a background RestartPolicyController goroutine that reconciles instance state every 5 seconds and handles health-check-driven stop/start cycles. The API gains new restart_policy / restart_status fields on Create and Update, and StopInstance has a behavior change: it now writes manual_stop to restart status even for already-stopped instances (previously a no-op).

The primary risks to watch are: stop/start loops if the reconciler misidentifies instances, a regression in StopInstance due to the new metadata write on the already-stopped path, and controller-level panics since this is a new background goroutine with no production history. Blast radius is limited — only instances with an explicit restart_policy set will be automatically restarted; browser sessions and other standard instances are unaffected. The plan checks against the 24h baseline of 0.069–0.096% 5xx error rate and monitors for restart-policy-specific WARN/ERROR logs. Status updates will be posted automatically on this PR as monitoring progresses.

View agent

Comment thread lib/instances/restart_policy.go
Comment thread lib/instances/manager.go
Comment thread lib/instances/restart_policy.go
Comment thread cmd/api/api/restart_policy.go Outdated
Comment thread lib/instances/restart_policy.go
Comment thread lib/instances/restart_policy.go
Comment thread lib/instances/manager.go Outdated
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 65ff54e. Configure here.

Comment thread lib/instances/restart_policy.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant