Skip to content

feat: add otel logger injection#427

Open
wangyb-A wants to merge 1 commit into
mainfrom
otel_logger
Open

feat: add otel logger injection#427
wangyb-A wants to merge 1 commit into
mainfrom
otel_logger

Conversation

@wangyb-A

@wangyb-A wangyb-A commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Issue #, if available: #385

Description

  • Add a wrap_logger hook to DurableInstrumentationPlugin and chain it in PluginExecutor.wrap_logger.
  • Wire logger wrapping into DurableContext.set_logger, and trigger it once on the root context in durable_execution. Child contexts inherit the already-wrapped logger; wrapping is idempotent.
  • New OtelEnrichedLogger that injects the active span context into each log extra.
  • Plugin resolves the correct span via get_current_span_context: the active operation span inside a step/child context, the enclosing operation span between steps, and the invocation span (from the registry) at the top level.
  • on_user_function_end now restores the enclosing operation span (parent_id) rather than always the invocation span, so between-step logs inside a child context correlate to that child context.

Testing

  • Unit test
  • Example logging:
// 1. handler() line 62:  context.logger.info("Workflow started")
//    Top-level handler code → invocation span
{
  "message": "Workflow started", "logger": "root",
  "otel.trace_id": "6a288a880695a2d879ef07517e60531b",
  "otel.span_id":  "bbe5fdb86682dac4",   // invocation
  "otel.trace_sampled": true
}

// 2. greet() line 45:  step_context.logger.info("Greeting inside step", ...)
//    Invoked via context.step(greet("world"), name="top-greet") at line 64
//    → top-greet step span
{
  "message": "Greeting inside step", "logger": "root",
  "operationName": "top-greet", "attempt": 1, "greeting_name": "world",
  "otel.trace_id": "6a288a880695a2d879ef07517e60531b",
  "otel.span_id":  "796a9b22593b6959",   // top-greet
  "otel.trace_sampled": true
}

// 3. greet_in_child() line 52:  child_context.logger.info("Entering child context")
//    Invoked via context.run_in_child_context(..., name="child-context") at line 65
//    → child-context span
{
  "message": "Entering child context", "logger": "root",
  "parentId": "c5faca15...3947e6",
  "otel.trace_id": "6a288a880695a2d879ef07517e60531b",
  "otel.span_id":  "be02ea8c17bd3715",   // child-context
  "otel.trace_sampled": true
}

// 4. greet() line 45 again, this time invoked via
//    child_context.step(greet("nested"), name="child-greet") at line 53
//    → child-greet step span (nested under child-context)
{
  "message": "Greeting inside step", "logger": "root",
  "operationName": "child-greet", "attempt": 1, "greeting_name": "nested",
  "otel.trace_id": "6a288a880695a2d879ef07517e60531b",
  "otel.span_id":  "67fdcf54688f3644",   // child-greet
  "otel.trace_sampled": true
}

// 5. greet_in_child() line 54:  child_context.logger.info("Leaving child context", ...)
//    Runs AFTER the inner child-greet step finished → restored to child-context span
//    (this is the parent-restore fix: NOT the invocation span)
{
  "message": "Leaving child context", "logger": "root",
  "parentId": "c5faca15...3947e6", "result": "hello nested",
  "otel.trace_id": "6a288a880695a2d879ef07517e60531b",
  "otel.span_id":  "be02ea8c17bd3715",   // child-context (matches #3)
  "otel.trace_sampled": true
}

// 6. handler() line 70:  context.logger.info("Workflow completed", ...)
//    Back at top level after the child context unwound → invocation span
{
  "message": "Workflow completed", "logger": "root",
  "top": "hello world", "nested": "hello nested",
  "otel.trace_id": "6a288a880695a2d879ef07517e60531b",
  "otel.span_id":  "bbe5fdb86682dac4",   // invocation (matches #1)
  "otel.trace_sampled": true
}


Comment on lines +157 to +160
2. The invocation span from the plugin registry. This is the path used
for top-level handler code: the invocation span is never attached to
the worker thread's context, so the registry is the only way to
resolve it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, on_invocation_start is running on main thread (instead of worker thread). So, even if we attach the span to the context, it's invisible for worker thread.

@wangyb-A wangyb-A marked this pull request as ready for review June 10, 2026 00:16
@wangyb-A wangyb-A requested a review from zhongkechen June 10, 2026 17:27

# enrich_logger defaults to True, so the execution logger is wrapped with OTel
# trace context injection (otel.trace_id, otel.span_id, otel.trace_sampled).
otel = DurableExecutionOtelPlugin(tracer_provider)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: For follow up in future pr, I think we also want to make the DurableExecutionOtelPlugin as "no-touch" as possible. The typical workflow for a customer adding OpenTelemetry is just adding a lambda layer and changing a few lambda environment variables and it should "just work".

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our case, we'd like it to be as simple as the following (if possible):

  1. Adding ADOT layer (or whatever layer we recommend)
  2. adding the aws-durable-execution-sdk-python-otel to their dependencies
  3. changing lambda env var
  4. enabling tracing on the lambda function
  5. profit

if span_context and span_context.is_valid:
return span_context

invocation_span = self._get_span(None)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the self._get_span(None) always return the invocation span? I don't see it being set anywhere. Do we even need this line and the block after it?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah never mind, I see that on_invocation_start calls _start_span(None,...) which will call

self._operation_spans[operation_id] = span

setting the invocation_span with operation_id = None

https://github.com/aws/aws-durable-execution-sdk-python/blame/f337788752001b5579c3a5b6ad7ce54d1ce51b79/packages/aws-durable-execution-sdk-python-otel/src/aws_durable_execution_sdk_python_otel/plugin.py#L266

But do we need this fallback code? I would think that trace.get_current_span().get_span_context() should return the current span, including the invocation span.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But do we need this fallback code?

Yes, we need it for current implementation, reason:

  1. Currently we only register span in on_user_function_start. We still need to get the invocation span if user call logger outside of user func (step, child)

  2. Currently on_invocation_start is called on main thread, on_user_function_start is called on worker thread. Handler code is called on worker thread. So, even if we register the span in on_invocation_start, the logger in worker thread cannot find the span.

# Logged inside a child context: enriched with the child-context span_id.
child_context.logger.info("Entering child context")
result: str = child_context.step(greet(name), name="child-greet")
child_context.logger.info("Leaving child context", extra={"result": result})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this log in your example logs, can you check if it actually is present and that the spanId is the same as the log for "Entering child context"?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update in the pr description

@wangyb-A wangyb-A self-assigned this Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants