Skip to content

[DPD] DPD CRD changes with version bump to v3.2#427

Open
rgadagot wants to merge 2 commits into
aws:mainfrom
rgadagot:main
Open

[DPD] DPD CRD changes with version bump to v3.2#427
rgadagot wants to merge 2 commits into
aws:mainfrom
rgadagot:main

Conversation

@rgadagot
Copy link
Copy Markdown
Contributor

@rgadagot rgadagot commented Jun 3, 2026

What's changing and why?

The feature in this release is for Disaggregated Prefill and Decoding (DPD) support to HyperPod Inference, separating the prefill and decode phases onto dedicated GPU pods. The solution extends the existing Inference Operator with role-specific Kubernetes Deployments (prefiller and decoder), GPU-to-GPU KV cache transfer via NIXL over EFA, conditional routing based on input length, and independent autoscaling.

Pins the operator deployment to amd64 Linux nodes via nodeAffinity to prevent CrashLoopBackOff errors when Kubernetes schedules the operator pod on ARM64/Graviton nodes in mixed-architecture clusters (operator images are amd64-only).

@rgadagot rgadagot requested a review from a team as a code owner June 3, 2026 21:44
@rgadagot rgadagot temporarily deployed to manual-approval June 5, 2026 17:31 — with GitHub Actions Inactive
type: object
x-kubernetes-map-type: atomic
type: object
pdSpec:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline, this field name was discussed with the team, PMs and SAs.

Comment thread helm_chart/HyperPodHelmChart/charts/inference-operator/Chart.yaml
Comment thread helm_chart/HyperPodHelmChart/charts/inference-operator/values.yaml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants