Add WaitForConditionAsync polling primitive (DOTNET-8665)#2376
Draft
GarrettBeatty wants to merge 12 commits into
Draft
Add WaitForConditionAsync polling primitive (DOTNET-8665)#2376GarrettBeatty wants to merge 12 commits into
GarrettBeatty wants to merge 12 commits into
Conversation
|
|
||
| COPY bin/publish/ ${LAMBDA_TASK_ROOT} | ||
|
|
||
| ENTRYPOINT ["/var/task/bootstrap"] |
|
|
||
| COPY bin/publish/ ${LAMBDA_TASK_ROOT} | ||
|
|
||
| ENTRYPOINT ["/var/task/bootstrap"] |
|
|
||
| COPY bin/publish/ ${LAMBDA_TASK_ROOT} | ||
|
|
||
| ENTRYPOINT ["/var/task/bootstrap"] |
|
|
||
| COPY bin/publish/ ${LAMBDA_TASK_ROOT} | ||
|
|
||
| ENTRYPOINT ["/var/task/bootstrap"] |
|
|
||
| COPY bin/publish/ ${LAMBDA_TASK_ROOT} | ||
|
|
||
| ENTRYPOINT ["/var/task/bootstrap"] |
464c591 to
d308c3b
Compare
7f91202 to
3fa06ce
Compare
Implements the minimum viable slice of the Amazon.Lambda.DurableExecution SDK: a workflow can run StepAsync and WaitAsync against a real Lambda, with replay-aware checkpointing wired through to the AWS service. Public API surface introduced: - DurableFunction.WrapAsync — entry point that handles the durable execution envelope (input hydration, output construction, status mapping) - IDurableContext.StepAsync / WaitAsync (4 Step overloads, 1 Wait) - StepConfig with serializer hook (retry deferred to follow-up PR) - ICheckpointSerializer interface - [DurableExecution] attribute (recognized by future source generator) - DurableExecutionException base + StepException Internals: - DurableExecutionHandler — Task.WhenAny race between user code and the suspension signal, returning Succeeded/Failed/Pending - ExecutionState — replay-aware operation lookup and pending checkpoint buffer - OperationIdGenerator — deterministic, replay-stable IDs - TerminationManager — TaskCompletionSource-based suspension trigger - LambdaDurableServiceClient — wraps AWSSDK.Lambda's checkpoint and state APIs Tests: - 86 unit tests covering enums, exceptions, models, configs, ID generation, termination, execution state, the handler race, the context (Step + Wait paths), and the WrapAsync entry point - 8 end-to-end integration tests deploying real Lambdas via Docker on the provided.al2023 runtime: StepWaitStep, MultipleSteps, WaitOnly, LongerWait, ReplayDeterminism, RetrySucceeds, RetryExhausts, StepFails Out of scope (follow-up PRs): - IRetryStrategy, ExponentialRetryStrategy, retry decision factories - DefaultJsonCheckpointSerializer - DurableLogger replay-suppression (currently returns NullLogger) - Callbacks, InvokeAsync, ParallelAsync, MapAsync, RunInChildContextAsync, WaitForConditionAsync — interface intentionally does not declare them - Annotations source-generator integration - DurableTestRunner / Amazon.Lambda.DurableExecution.Testing package - dotnet new lambda.DurableFunction blueprint stack-info: PR: #2360, branch: GarrettBeatty/stack/2 remove update update update update
Match the Python / Java / JavaScript reference SDKs' replay-mode model: the workflow is "replaying" iff it has not yet revisited every checkpointed completed user-replayable operation. A single global flag flipped on the first fresh op (the prior model) misclassified workflow- body code that runs before the first step and would not generalize to Map/Parallel/Callback later. ExecutionState changes: - Replace `Mode`/`ExecutionMode`/`EnterExecutionMode()` with `IsReplaying` + `TrackReplay(operationId)`. - Initial replay decision: any non-EXECUTION op present means we're replaying. The service always sends an EXECUTION-type op carrying the input payload — that's bookkeeping, not user history, so it does not count toward replay (matches Python execution.py:258, Java ExecutionManager:81, JS execution-context.ts:62). - TrackReplay flips IsReplaying false once every checkpointed terminal- status non-EXECUTION op has been visited. Terminal set matches Python's: SUCCEEDED, FAILED, CANCELLED, STOPPED. Operation changes: - DurableOperation.ExecuteAsync calls TrackReplay(OperationId) at the top, so every operation participates in visit accounting without each subclass needing to remember. - StepOperation/WaitOperation drop their manual EnterExecutionMode calls. Tests: - ExecutionStateTests rewritten around IsReplaying/TrackReplay, including pinning regressions: only-EXECUTION-op ⇒ NotReplaying, all-visited ⇒ flips out of replay, PENDING ops do not block transition, idempotency. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Serializer DurableExecution now reads the registered ILambdaSerializer from the per-invocation ILambdaContext (added in the prior PR) for both step-result checkpointing and workflow input/output. AOT-safety is now determined entirely by which serializer the user registers with LambdaBootstrapBuilder.Create — there is no longer a forked path between reflection-based and AOT-safe APIs. Removed: - ICheckpointSerializer<T> + SerializationContext record - ReflectionJsonCheckpointSerializer<T> - The four JsonSerializerContext-taking overloads of DurableFunction.WrapAsync - The IDurableContext.StepAsync overload that took ICheckpointSerializer<T> - All [RequiresUnreferencedCode]/[RequiresDynamicCode] attributes and their related [UnconditionalSuppressMessage] shims Net result: 8 WrapAsync overloads → 4, 3 StepAsync overloads → 2, zero trim attributes in the public API. The AOT smoke test continues to publish with zero IL2026/IL3050 warnings.
- Wrap LambdaDurableServiceClient SDK calls in DurableExecutionException with
durable-execution context (which call, which ARN). User logs no longer show
bare AWSSDK stack traces. Update IsTerminalCheckpointError to unwrap the
inner AmazonServiceException for classification.
- Move public-API files out of Models/, Config/, Exceptions/ into the project
root so folder layout matches the Amazon.Lambda.DurableExecution namespace.
- Replace string action literals ("SUCCEED", "FAIL", "START") with the
Amazon.Lambda.OperationAction enum constants.
- Replace hand-rolled ToHex with Amazon.Util.AWSSDKUtils.ToHex. Drop the
netstandard2.0 SHA-256 fallback now that DurableExecution targets net8+.
- Spell "iff" as "if and only if" in ExecutionState replay-mode docs.
Tests updated for the new wrapping shape: terminal classification asserts on
DurableExecutionException with the inner SDK exception preserved; transient
and hydration paths assert ThrowsAsync<DurableExecutionException> with
InnerException set to the original AmazonServiceException.
Adds child-context support to the .NET Durable Execution SDK. A child context is a logical sub-workflow with its own deterministic operation-ID space, persisted as a CONTEXT operation so subsequent invocations replay the cached value without re-executing the function. Public surface: - IDurableContext.RunInChildContextAsync<T> (reflection + AOT-safe ICheckpointSerializer<T> overloads, plus a void overload). - ChildContextConfig with SubType (observability label) and ErrorMapping (transform exceptions before they surface to the caller). - ChildContextException for failure surfacing. Used as a building block for upcoming WaitForCallbackAsync. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lays down shared types/constants for the upcoming durable-execution context operations (Callbacks, Invoke, Parallel, Map, WaitForCondition) and updates the design doc to match decisions reached after comparing against the Python, JS, and Java reference SDKs. SDK changes: - OperationSubTypes constants class (Step, Wait, Callback, WaitForCallback, Invoke, WaitForCondition, Parallel, ParallelBranch, Map, MapIteration). Replaces hard-coded SubType literals in StepOperation and WaitOperation. - OperationStatuses.TimedOut for callback/invoke timeout handling. Design-doc alignment: - Drop Serializer field from CallbackConfig, InvokeConfig, ChildContextConfig. Custom serializers flow through AOT-safe ICheckpointSerializer<T> overloads (matches the existing StepConfig pattern documented at line 1247). - InvokeConfig gains TenantId (matches Python/JS/Java); drops PayloadSerializer / ResultSerializer. - BatchItemStatus.Cancelled -> Started. The SDK does not synchronously cancel branches; the wire state of items still in flight when the batch resolves (e.g., FirstSuccessful short-circuit) is STARTED. Matches Python and JS. - IBatchResult<T> expanded to the full JS/Python surface: adds Started, GetErrors(), HasFailure, SuccessCount, FailureCount, StartedCount, TotalCount. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
d308c3b to
be4c3ad
Compare
Adds service-mediated polling to the .NET Durable Execution SDK. WaitForConditionAsync repeatedly evaluates a check function with configurable wait strategy between attempts; each iteration is its own Lambda invocation (suspended via STEP+RETRY checkpoints carrying NextAttemptDelaySeconds), so polling does not consume compute time. Public surface: - IDurableContext.WaitForConditionAsync<TState> (single overload; the per-iteration state checkpoint is serialized via the ILambdaSerializer registered on ILambdaContext.Serializer, configured via LambdaBootstrapBuilder.Create(handler, serializer)) - IConditionCheckContext (Logger + AttemptNumber) - WaitForConditionConfig<TState> (required InitialState + WaitStrategy) - IWaitStrategy<TState> with Decide(state, attempt) returning WaitDecision - WaitDecision (readonly record struct, ShouldContinue + Delay, Stop() / ContinueAfter(TimeSpan) factories) - WaitStrategy factories: Exponential / Linear / Fixed / FromDelegate, each accepting an optional Func<TState, bool> isDone predicate - WaitForConditionException with AttemptsExhausted and LastState (preserved across both live execution and replay) Internal: - WaitForConditionOperation<TState> wire format = STEP + SubType "WaitForCondition". Each polling iteration emits Action=RETRY with the new state in payload and NextAttemptDelaySeconds for the service to schedule the next invocation. - Serialization is delegated to the registered ILambdaSerializer via Stream-based Serialize<T>/Deserialize<T> calls; no AOT trim attributes on the public API. Mirrors StepOperation/ChildContextOperation. - Strategies signal max-attempts exhausted by throwing WaitForConditionException directly from Decide(); the operation enriches with LastState before checkpointing FAIL. - LastState survives FAIL replay: serialized into FAIL payload at write time, deserialized in BuildFailureException with warning-logged fallback for legacy/corrupt data. - ExponentialBackoff helper extracted for sharing with ExponentialRetryStrategy. Math is byte-for-byte identical. - Reuses OperationSubTypes.WaitForCondition from Wave 0. Defaults: 60 attempts / 5s initial / 300s max / 1.5x rate / Full jitter - distinct from RetryStrategy.Default and matching Python/JS/Java reference SDKs. (Note: Python returns success on max-attempts; .NET/Java/JS throw - documented in design doc.) Adds 41 unit tests + 5 integration tests covering each wait strategy, isDone predicate paths, max-attempts exhaustion, user-check exceptions, replay determinism, exponential backoff bounds, and corrupt-payload fallback logging. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3fa06ce to
67f0c0c
Compare
ad4d208 to
3acbed5
Compare
Base automatically changed from
gcbeatty/durable-wave0
to
gcbeatty/durable-child-context
May 20, 2026 17:46
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
#2216
Summary
Adds service-mediated polling to the .NET Durable Execution SDK.
WaitForConditionAsyncrepeatedly evaluates a check function with configurable wait strategy between attempts; each iteration is its own Lambda invocation (suspended via STEP+RETRY checkpoints carryingNextAttemptDelaySeconds), so polling does not consume compute time.Stacked on top of #2372 (Wave 0 cross-cutting types).
Fixes DOTNET-8665.
Public surface
IDurableContext.WaitForConditionAsync<TState>(single overload; per-iteration state is serialized via theILambdaSerializerregistered onILambdaContext.Serializer, configured viaLambdaBootstrapBuilder.Create(handler, serializer)— same pattern asStepAsync/RunInChildContextAsync)IConditionCheckContext(Logger + AttemptNumber)WaitForConditionConfig<TState>(required InitialState + WaitStrategy)IWaitStrategy<TState>withDecide(state, attempt)returningWaitDecisionWaitDecision(readonly record struct;ShouldContinue+Delay;Stop()/ContinueAfter(TimeSpan)factories)WaitStrategyfactories:Exponential/Linear/Fixed/FromDelegate, each accepting an optionalFunc<TState, bool>isDonepredicateWaitForConditionExceptionwithAttemptsExhaustedandLastState(preserved across both live execution and replay)Internal
WaitForConditionOperation<TState>wire format = STEP + SubType"WaitForCondition". Each polling iteration emitsAction=RETRYwith the new state in payload andNextAttemptDelaySecondsfor the service to schedule the next invocation.ILambdaSerializervia stream-basedSerialize<T>/Deserialize<T>calls — no AOT trim attributes on the public API. MirrorsStepOperation/ChildContextOperation.WaitForConditionExceptiondirectly fromDecide(); the operation enriches withLastStatebefore checkpointing FAIL.LastStatesurvives FAIL replay: serialized into FAIL payload at write time, deserialized inBuildFailureExceptionwith warning-logged fallback for legacy/corrupt data.ExponentialBackoffhelper extracted for sharing withExponentialRetryStrategy. Math is byte-for-byte identical.OperationSubTypes.WaitForConditionfrom Wave 0.Defaults
60 attempts / 5s initial / 300s max / 1.5x rate / Full jitter — distinct from
RetryStrategy.Defaultand matching Python/JS/Java reference SDKs.Test plan
🤖 Generated with Claude Code