Skip to content

fix: initialize node uuid during enrollment#224

Open
jingxiang-z wants to merge 1 commit into
mainfrom
fix/enroll-node-uuid
Open

fix: initialize node uuid during enrollment#224
jingxiang-z wants to merge 1 commit into
mainfrom
fix/enroll-node-uuid

Conversation

@jingxiang-z

@jingxiang-z jingxiang-z commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

Description

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Summary by CodeRabbit

  • New Features
    • Centralized node UUID “load or create” is now used during enrollment and server startup, persisted in agent state.
    • Node UUID prefers hardware product UUID when available, with fallback to a generated UUID for reuse.
  • Bug Fixes
    • Enrollment now fails fast if node UUID initialization cannot complete, preventing subsequent post-enroll steps.
    • Re-runs are idempotent and can repair empty stored values.
  • Deployment
    • Helm DaemonSet enroll init step now mounts host /sys read-only.
  • Tests
    • Added/updated coverage for persistence, reuse, and error propagation.

@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9af13a6e-007e-45e7-a99e-c5fab9f84e71

📥 Commits

Reviewing files that changed from the base of the PR and between 1edc70f and 026828f.

📒 Files selected for processing (13)
  • deployments/helm/fleet-intelligence-agent/templates/daemonset.yaml
  • go.mod
  • internal/agentstate/sqlite.go
  • internal/agentstate/sqlite_test.go
  • internal/agentstate/state.go
  • internal/attestation/backend_test.go
  • internal/enrollment/enrollment.go
  • internal/enrollment/enrollment_test.go
  • internal/inventory/sink/backend_test.go
  • internal/nodeidentity/nodeidentity.go
  • internal/nodeidentity/nodeidentity_test.go
  • internal/server/server.go
  • internal/server/server_test.go
✅ Files skipped from review due to trivial changes (1)
  • internal/nodeidentity/nodeidentity_test.go
🚧 Files skipped from review as they are similar to previous changes (10)
  • internal/enrollment/enrollment.go
  • deployments/helm/fleet-intelligence-agent/templates/daemonset.yaml
  • internal/server/server_test.go
  • internal/agentstate/sqlite_test.go
  • internal/attestation/backend_test.go
  • internal/inventory/sink/backend_test.go
  • internal/agentstate/state.go
  • internal/enrollment/enrollment_test.go
  • internal/nodeidentity/nodeidentity.go
  • go.mod

📝 Walkthrough

Walkthrough

This PR adds node UUID persistence helpers and wires them into server and enrollment flows. The state contract now supports get-or-create behavior, and tests cover the helper, server initialization, and enrollment ordering.

Changes

Node UUID initialization

Layer / File(s) Summary
Helper, state contract, and persistence
internal/nodeidentity/nodeidentity.go, internal/nodeidentity/nodeidentity_test.go, internal/agentstate/state.go, internal/agentstate/sqlite.go, internal/agentstate/sqlite_test.go, go.mod
EnsureNodeUUID now uses a get-or-create state contract, reads and validates DMI product UUIDs from sysfs, and is covered by filesystem and persistence tests; procfs is promoted to a direct dependency.
Server machine ID wiring
internal/server/server.go, internal/server/server_test.go, internal/attestation/backend_test.go, internal/inventory/sink/backend_test.go
The server delegates machine ID initialization to nodeidentity.EnsureNodeUUID through a database-backed state adapter, and the affected test doubles and machine ID test are updated for the new state method.
Enrollment node UUID step
internal/enrollment/enrollment.go, internal/enrollment/enrollment_test.go, deployments/helm/fleet-intelligence-agent/templates/daemonset.yaml
Enrollment now runs node UUID initialization before post-enroll inventory sync, the workflow tests cover the call order and error path, and the init container mounts /sys read-only.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

A bunny hops through bytes so bright,
Tucking UUIDs in just right. 🐰
From DMI roots to stored delight,
The same old ID survives the night.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly matches the main change: initializing the node UUID during enrollment.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/enroll-node-uuid

Comment @coderabbitai help to get the list of available commands.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a9f16b8dec

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/enrollment/enrollment.go

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
internal/enrollment/enrollment_test.go (1)

79-82: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Assert the state passed to ensureNodeUUID.

The fake currently succeeds even if production wiring passes a nil nodeidentity.State, while the real helper would fail. Add a non-nil assertion here to cover that contract.

Proposed test tightening
 	ensureCalled := false
 	ensureNodeUUID = func(ctx context.Context, state nodeidentity.State) (string, error) {
+		require.NotNil(t, state)
 		ensureCalled = true
 		return "4c4c4544-0053-5210-8038-c8c04f583034", nil
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/enrollment/enrollment_test.go` around lines 79 - 82, The test stub
for ensureNodeUUID is too permissive because it does not verify the
nodeidentity.State argument, so it can pass even if production wiring supplies
nil. Tighten the fake in internal/enrollment/enrollment_test.go by asserting the
state passed into ensureNodeUUID is non-nil before returning the UUID, using the
ensureNodeUUID helper and ensureCalled setup already in the test to locate the
change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/enrollment/enrollment.go`:
- Around line 82-87: The enrollment flow in ensureNodeUUID and
storeConfigInMetadata can leave partial state because metadata is committed
before UUID initialization may fail. Reorder the logic in the enrollment path so
node UUID setup happens before calling storeConfigInMetadata, or otherwise make
both steps atomic/rollback-safe so failures do not report enrollment as failed
after credentials have already been persisted.

In `@internal/nodeidentity/nodeidentity.go`:
- Around line 50-73: Ensure `EnsureNodeUUID` is atomic at the `State` boundary
instead of doing a separate `GetNodeUUID` then `SetNodeUUID` in
`nodeidentity.go`. Update the `State` API (and its implementation) to provide a
single get-or-create/compare-and-set style operation that returns the committed
node UUID, then have `EnsureNodeUUID` use that value directly so concurrent
callers cannot generate different IDs or return an unpersisted UUID.
- Around line 94-100: Reject the all-zero DMI product UUID in the node identity
path so it falls back to a random ID instead of being persisted as a real node
ID. In the logic that trims and parses dmi.ProductUUID inside the node identity
function, add an explicit check for the nil/zero UUID value after parsing (or
before returning it) and treat it the same as an empty/invalid hardware UUID,
returning an error that triggers the existing fallback behavior.

---

Nitpick comments:
In `@internal/enrollment/enrollment_test.go`:
- Around line 79-82: The test stub for ensureNodeUUID is too permissive because
it does not verify the nodeidentity.State argument, so it can pass even if
production wiring supplies nil. Tighten the fake in
internal/enrollment/enrollment_test.go by asserting the state passed into
ensureNodeUUID is non-nil before returning the UUID, using the ensureNodeUUID
helper and ensureCalled setup already in the test to locate the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c316fbbe-4918-40d8-afef-388f978dfbb4

📥 Commits

Reviewing files that changed from the base of the PR and between 67cf5cf and a9f16b8.

📒 Files selected for processing (7)
  • go.mod
  • internal/enrollment/enrollment.go
  • internal/enrollment/enrollment_test.go
  • internal/nodeidentity/nodeidentity.go
  • internal/nodeidentity/nodeidentity_test.go
  • internal/server/server.go
  • internal/server/server_test.go

Comment thread internal/enrollment/enrollment.go Outdated
Comment thread internal/nodeidentity/nodeidentity.go Outdated
Comment thread internal/nodeidentity/nodeidentity.go
@jingxiang-z jingxiang-z force-pushed the fix/enroll-node-uuid branch from a9f16b8 to dfd4e89 Compare June 25, 2026 17:23

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
internal/attestation/backend_test.go (1)

54-56: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Avoid a successful empty UUID in the stub.

The new state method should either return an existing non-empty UUID or create one. Returning empty success can let tests pass if a future path accidentally consumes this stub through EnsureNodeUUID.

Proposed fix
-func (s *stubState) GetOrCreateNodeUUID(context.Context, func() (string, error)) (string, bool, error) {
-	return "", false, nil
+func (s *stubState) GetOrCreateNodeUUID(_ context.Context, create func() (string, error)) (string, bool, error) {
+	if s.nodeErr != nil {
+		return "", false, s.nodeErr
+	}
+	if s.nodeOK && s.nodeUUID != "" {
+		return s.nodeUUID, false, nil
+	}
+	value, err := create()
+	if err != nil {
+		return "", false, err
+	}
+	s.nodeUUID = value
+	s.nodeOK = true
+	return value, true, nil
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/attestation/backend_test.go` around lines 54 - 56, The stubbed
GetOrCreateNodeUUID method in stubState is returning a successful empty UUID,
which can mask bugs in EnsureNodeUUID-based paths. Update this stub so it never
reports success with an empty value: either return a non-empty UUID when
simulating an existing node, or return an error/failed state when creation is
not meant to happen in the test.
internal/inventory/sink/backend_test.go (1)

66-68: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Make the fake honor the get-or-create contract.

Returning ("", false, nil) models a successful empty node UUID, which real implementations reject. Have the fake return an existing nodeUUID or invoke create and store the result.

Proposed fix
-func (f *fakeState) GetOrCreateNodeUUID(context.Context, func() (string, error)) (string, bool, error) {
-	return "", false, nil
+func (f *fakeState) GetOrCreateNodeUUID(_ context.Context, create func() (string, error)) (string, bool, error) {
+	if f.err != nil {
+		return "", false, f.err
+	}
+	if f.nodeUUID != "" {
+		return f.nodeUUID, false, nil
+	}
+	value, err := create()
+	if err != nil {
+		return "", false, err
+	}
+	f.nodeUUID = value
+	return value, true, nil
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/inventory/sink/backend_test.go` around lines 66 - 68, The fake
implementation of GetOrCreateNodeUUID in fakeState is not honoring the real
contract because it always returns an empty UUID with no error, which can hide
failures in tests. Update fakeState.GetOrCreateNodeUUID to either return a
previously stored nodeUUID when present or call the provided create function,
store the generated UUID on the fakeState, and return it with the correct
boolean result so behavior matches the real backend.
internal/agentstate/sqlite_test.go (1)

123-143: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Add a regression case for empty persisted node UUID values.

This helper should recover if the metadata key exists with an empty value; the current test only covers a missing row and an already-populated row.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/agentstate/sqlite_test.go` around lines 123 - 143, Add a regression
test in TestSQLiteStateGetOrCreateNodeUUID for the case where the persisted node
UUID metadata key exists but its stored value is empty. Update the test setup
around state.GetOrCreateNodeUUID to seed an empty row first, then verify the
helper treats it like missing data by invoking the create callback, returning
created=true, and persisting the new UUID; keep the existing missing-row and
populated-row coverage intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/agentstate/sqlite.go`:
- Around line 151-166: The get-or-create flow in readNodeUUIDDB/get-or-create
logic is treating an existing empty gpud_metadata row as committed failure
because sqlite uses INSERT OR IGNORE and then checks for a missing write. Update
the persistence path in the node UUID code to upsert only when the stored value
is empty, so an empty existing row gets filled while any non-empty committed ID
is preserved. Keep the behavior tied to the readNodeUUIDDB and the insert/update
block in sqlite.go, and make the success check confirm the final stored value
matches the candidate instead of assuming the insert must happen.

In `@internal/server/server.go`:
- Around line 168-183: The server adapter’s metadata write path in the node UUID
handling has the same INSERT OR IGNORE empty-row failure mode, so an existing
empty MetadataKeyMachineID row can still block startup instead of being
repaired. Update the logic around the s.dbRW ExecContext call and the subsequent
readNodeUUIDDB check so that an empty committed value is treated as absent and
triggers a healing overwrite, matching the same empty-row repair semantics used
elsewhere in the node UUID flow. Keep the fix localized to the node UUID
metadata persistence path and preserve the existing committed/ok return
contract.

---

Nitpick comments:
In `@internal/agentstate/sqlite_test.go`:
- Around line 123-143: Add a regression test in
TestSQLiteStateGetOrCreateNodeUUID for the case where the persisted node UUID
metadata key exists but its stored value is empty. Update the test setup around
state.GetOrCreateNodeUUID to seed an empty row first, then verify the helper
treats it like missing data by invoking the create callback, returning
created=true, and persisting the new UUID; keep the existing missing-row and
populated-row coverage intact.

In `@internal/attestation/backend_test.go`:
- Around line 54-56: The stubbed GetOrCreateNodeUUID method in stubState is
returning a successful empty UUID, which can mask bugs in EnsureNodeUUID-based
paths. Update this stub so it never reports success with an empty value: either
return a non-empty UUID when simulating an existing node, or return an
error/failed state when creation is not meant to happen in the test.

In `@internal/inventory/sink/backend_test.go`:
- Around line 66-68: The fake implementation of GetOrCreateNodeUUID in fakeState
is not honoring the real contract because it always returns an empty UUID with
no error, which can hide failures in tests. Update fakeState.GetOrCreateNodeUUID
to either return a previously stored nodeUUID when present or call the provided
create function, store the generated UUID on the fakeState, and return it with
the correct boolean result so behavior matches the real backend.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9891ced6-b514-4c18-8461-2d01259615b7

📥 Commits

Reviewing files that changed from the base of the PR and between a9f16b8 and dfd4e89.

📒 Files selected for processing (13)
  • deployments/helm/fleet-intelligence-agent/templates/daemonset.yaml
  • go.mod
  • internal/agentstate/sqlite.go
  • internal/agentstate/sqlite_test.go
  • internal/agentstate/state.go
  • internal/attestation/backend_test.go
  • internal/enrollment/enrollment.go
  • internal/enrollment/enrollment_test.go
  • internal/inventory/sink/backend_test.go
  • internal/nodeidentity/nodeidentity.go
  • internal/nodeidentity/nodeidentity_test.go
  • internal/server/server.go
  • internal/server/server_test.go
✅ Files skipped from review due to trivial changes (2)
  • deployments/helm/fleet-intelligence-agent/templates/daemonset.yaml
  • go.mod
🚧 Files skipped from review as they are similar to previous changes (4)
  • internal/enrollment/enrollment.go
  • internal/nodeidentity/nodeidentity_test.go
  • internal/enrollment/enrollment_test.go
  • internal/server/server_test.go

Comment thread internal/agentstate/sqlite.go
Comment thread internal/server/server.go Outdated
@jingxiang-z jingxiang-z force-pushed the fix/enroll-node-uuid branch from dfd4e89 to 1edc70f Compare June 25, 2026 17:33

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
internal/server/server.go (1)

145-207: 📐 Maintainability & Code Quality | 🔵 Trivial | 🏗️ Heavy lift

Centralize the node UUID persistence sequence.

This read/create/upsert/re-read logic duplicates internal/agentstate/sqlite.go Lines 115-190; extracting a shared helper would avoid future contract drift across the two production paths.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/server/server.go` around lines 145 - 207, The node UUID persistence
flow in GetOrCreateNodeUUID duplicates the same read/create/upsert/re-read
sequence already implemented in the other production path, so the two
implementations can drift. Extract the shared logic into a common helper used by
dbNodeUUIDState.GetOrCreateNodeUUID and the counterpart in
internal/agentstate/sqlite.go, keeping the contract and error handling identical
while preserving the existing readNodeUUIDDB and metadata insert behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@internal/server/server.go`:
- Around line 145-207: The node UUID persistence flow in GetOrCreateNodeUUID
duplicates the same read/create/upsert/re-read sequence already implemented in
the other production path, so the two implementations can drift. Extract the
shared logic into a common helper used by dbNodeUUIDState.GetOrCreateNodeUUID
and the counterpart in internal/agentstate/sqlite.go, keeping the contract and
error handling identical while preserving the existing readNodeUUIDDB and
metadata insert behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a4118f2c-7eb7-4d84-906c-f1cb3b0b4b79

📥 Commits

Reviewing files that changed from the base of the PR and between dfd4e89 and 1edc70f.

📒 Files selected for processing (13)
  • deployments/helm/fleet-intelligence-agent/templates/daemonset.yaml
  • go.mod
  • internal/agentstate/sqlite.go
  • internal/agentstate/sqlite_test.go
  • internal/agentstate/state.go
  • internal/attestation/backend_test.go
  • internal/enrollment/enrollment.go
  • internal/enrollment/enrollment_test.go
  • internal/inventory/sink/backend_test.go
  • internal/nodeidentity/nodeidentity.go
  • internal/nodeidentity/nodeidentity_test.go
  • internal/server/server.go
  • internal/server/server_test.go
✅ Files skipped from review due to trivial changes (2)
  • deployments/helm/fleet-intelligence-agent/templates/daemonset.yaml
  • go.mod
🚧 Files skipped from review as they are similar to previous changes (6)
  • internal/enrollment/enrollment.go
  • internal/agentstate/state.go
  • internal/nodeidentity/nodeidentity_test.go
  • internal/enrollment/enrollment_test.go
  • internal/server/server_test.go
  • internal/nodeidentity/nodeidentity.go

@jingxiang-z jingxiang-z self-assigned this Jun 25, 2026
Comment thread internal/agentstate/sqlite.go Outdated
Comment thread internal/agentstate/state.go
Comment thread internal/server/server.go Outdated
Comment thread internal/nodeidentity/nodeidentity.go
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
@jingxiang-z jingxiang-z force-pushed the fix/enroll-node-uuid branch from 1edc70f to 026828f Compare June 25, 2026 20:56
@jingxiang-z jingxiang-z requested a review from ambermingxin June 25, 2026 21:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants