Fix service bus client lifecycle#4930
Conversation
…status_updater.py, airlock_request_status_update.py, and runner.py to prevent connection socket and AMQP channel leaks
…and deployment_status_updater.py for improved heartbeat logging and error handling; update test_runner.py to mock ServiceBusClient correctly; enhance runner.py with consistent exception handling and retry logic.
Unit Test Results673 tests 673 ✅ 8s ⏱️ Results for commit 93e86b8. ♻️ This comment has been updated with latest results. |
There was a problem hiding this comment.
Pull request overview
This PR fixes inconsistent Azure Service Bus client lifecycle handling across the API service-bus listeners and the resource processor runner to reduce the risk of leaking sockets/AMQP channels during long-running polling and reconnect loops.
Changes:
- Wrap
ServiceBusClientusage in async context managers so clients are deterministically closed. - Add backoff (
asyncio.sleep(10)) on connection/unknown exceptions in the runner and service-bus listeners. - Update runner unit tests to account for the async context manager usage; add a changelog entry.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
api_app/service_bus/deployment_status_updater.py |
Uses async with ServiceBusClient(...) to ensure the client is closed across reconnect cycles. |
api_app/service_bus/airlock_request_status_update.py |
Uses async with ServiceBusClient(...) to ensure the client is closed; adjusts retry sleeps. |
resource_processor/vmss_porter/runner.py |
Uses async with ServiceBusClient(...) for proper client teardown; adds retry sleeps on exceptions. |
resource_processor/tests_rp/test_runner.py |
Updates mocks to handle the ServiceBusClient async context manager behavior. |
CHANGELOG.md |
Adds an unreleased BUG FIXES entry for the lifecycle fix. |
…0.25.17 and 0.13.4; refine debug logging in airlock_request_status_update.py
|
/test-extended |
|
🤖 pr-bot 🤖 🏃 Running extended tests: https://github.com/microsoft/AzureTRE/actions/runs/28157965154 (with refid (in response to this comment from @rudolphjacksonm) |
|
/test-extended |
|
🤖 pr-bot 🤖 🏃 Running extended tests: https://github.com/microsoft/AzureTRE/actions/runs/28224500075 (with refid (in response to this comment from @maxmartin-cgi) |
|
/test-extended |
|
🤖 pr-bot 🤖 🏃 Running extended tests: https://github.com/microsoft/AzureTRE/actions/runs/28243761740 (with refid (in response to this comment from @maxmartin-cgi) |
What is being addressed
inconsistent ServiceBusClient lifecycle management in deployment_status_updater.py, airlock_request_status_update.py, and runner.py.
key message listeners repeatedly instantiate ServiceBusClient inside infinite loops but discard the instances without calling close() or wrapping them in context managers. This can leak connection sockets and AMQP channels under transient reconnect loops.
How is this addressed
Prevents AMQP channel leaks: Ensures channels are cleaned up even under transient reconnect scenarios
Consistent pattern: All three files now follow the same best practice of using