fix: improve webhook reliability with timeout, break-to-continue, and explicit error status#3446
Conversation
… explicit error status - Replace break with continue in notify_webhooks to prevent silently dropping remaining log entries when one entry lacks an organizer or webhook type (fixes fossasia#3400) - Add 30-second timeout to requests.post() in send_webhook to prevent Celery worker pool exhaustion from hanging endpoints (fixes fossasia#3398) - Add explicit success=False in RequestException error path for consistent WebHookCall records - Add debug logging for skipped entries to aid troubleshooting
Reviewer's GuideImproves reliability and observability of webhook processing by skipping malformed log entries instead of aborting the loop, adding a request timeout to outbound webhook calls, and explicitly recording failure status on exceptions. Sequence diagram for webhook notification dispatch and error handlingsequenceDiagram
participant Caller as CeleryScheduler_or_App
participant Notify as notify_webhooks
participant DB as Database
participant SendTask as send_webhook_task
participant Worker as CeleryWorker
participant Endpoint as WebhookEndpoint
Caller->>Notify: notify_webhooks(logentry_ids)
Notify->>DB: Query LogEntry by ids
DB-->>Notify: LogEntry queryset
loop For_each_logentry
alt Missing_organizer
Notify->>Notify: logger.debug(skip_no_organizer)
Notify->>Notify: continue_to_next_logentry
else Missing_notification_type
Notify->>Notify: logger.debug(skip_no_webhook_type)
Notify->>Notify: continue_to_next_logentry
else Valid_logentry
Notify->>DB: Get_or_cache_webhooks_for(organizer,action_type)
DB-->>Notify: Webhook_list
loop For_each_webhook
Notify->>SendTask: send_webhook.apply_async(logentry_id, action_type, webhook_id)
end
end
end
par Webhook_delivery_for_each_task
Caller->>Worker: Execute send_webhook(logentry_id, action_type, webhook_id)
Worker->>DB: Load LogEntry, WebHook
DB-->>Worker: Entities
Worker->>Endpoint: HTTP POST payload (timeout=30s)
alt HTTP_success
Worker->>DB: WebHookCall.create(success=True, return_code=status_code)
else HTTP_error_or_timeout
Worker->>DB: WebHookCall.create(success=False, return_code=0)
Worker->>Worker: self.retry(countdown=2**(retries*2))
end
end
File-Level Changes
Assessment against linked issues
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- Consider sourcing
WEBHOOK_TIMEOUTfrom Django settings (or a config object) rather than a hard-coded module constant so deployments can tune the timeout without code changes. - The new debug logs in
notify_webhooksmight be useful at higher levels (e.g., info/warning) or augmented with organizer/event identifiers if these conditions indicate data inconsistencies that warrant operator attention.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider sourcing `WEBHOOK_TIMEOUT` from Django settings (or a config object) rather than a hard-coded module constant so deployments can tune the timeout without code changes.
- The new debug logs in `notify_webhooks` might be useful at higher levels (e.g., info/warning) or augmented with organizer/event identifiers if these conditions indicate data inconsistencies that warrant operator attention.
## Individual Comments
### Comment 1
<location path="app/eventyay/api/webhooks.py" line_range="335" />
<code_context>
+ allow_redirects=False,
+ timeout=WEBHOOK_TIMEOUT,
+ )
WebHookCall.objects.create(
webhook=webhook,
action_type=logentry.action_type,
</code_context>
<issue_to_address>
**suggestion:** Consider explicitly setting `success=True` on successful webhook calls for consistency.
Since the exception path now sets `success=False` explicitly, it would be clearer and more robust to also pass `success=True` on the success path, rather than relying on defaults or null, so `WebHookCall` consumers can always treat `success` as a concrete boolean.
Suggested implementation:
```python
WebHookCall.objects.create(
webhook=webhook,
action_type=logentry.action_type,
return_code=resp.status_code,
payload=json.dumps(payload),
response_body=resp.text[: settings.MAX_SIZE_CONFIG[SizeKey.RESPONSE_SIZE_WEBHOOK]],
success=True,
)
```
I can only see the exception-path `WebHookCall.objects.create(...)` in your snippet. The SEARCH block above assumes the success-path call looks like:
- Uses `resp.status_code` for `return_code`
- Uses `resp.text[: settings.MAX_SIZE_CONFIG[SizeKey.RESPONSE_SIZE_WEBHOOK]]` for `response_body`
If the actual success-path call differs (e.g., different variable names or slicing), update the SEARCH section to match your exact code and keep the `success=True` line in the REPLACE section. The key requirement is that the success-path `WebHookCall.objects.create(...)` includes `success=True` explicitly.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
Thanks for the suggestion @sourcery-ai! |
|
Updated as suggested — moved webhook timeout to settings for configurability. |
|
Addressed feedback and made timeout configurable via settings. |
There was a problem hiding this comment.
Pull request overview
Improves outbound webhook delivery robustness in Celery tasks by preventing batch aborts on malformed log entries, adding a configurable HTTP timeout, and making failure recording more consistent.
Changes:
- Skip individual log entries without organizer/event type using
continue(and add debug logging) instead of aborting the whole batch withbreak. - Add a configurable webhook HTTP timeout via settings and apply it to outbound requests.
- Limit webhook response body capture to a bounded size to reduce OOM risk, and explicitly record
success=Falseon request exceptions.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| app/eventyay/config/settings.py | Adds webhook_timeout config and exposes WEBHOOK_TIMEOUT for use by webhook delivery tasks. |
| app/eventyay/api/webhooks.py | Fixes loop control in notify_webhooks, adds timeout + streamed/limited response reading in send_webhook, and explicitly marks failure records. |
| try: | ||
| resp = requests.post(webhook.target_url, json=payload, allow_redirects=False) | ||
| max_body_size = settings.MAX_SIZE_CONFIG[SizeKey.RESPONSE_SIZE_WEBHOOK] | ||
| timeout = getattr(settings, "WEBHOOK_TIMEOUT", 30) |
| resp = requests.post( | ||
| webhook.target_url, | ||
| json=payload, | ||
| allow_redirects=False, | ||
| timeout=timeout, | ||
| stream=True, | ||
| ) | ||
| # Read only up to max_body_size bytes to prevent OOM | ||
| # from oversized responses before truncation | ||
| response_body = resp.raw.read(max_body_size).decode( | ||
| 'utf-8', errors='replace' | ||
| ) | ||
| resp.close() |
| # Upload size limit in MB, needs to to in accordance with SizeKey | ||
| upload_size_csv: int = 1 | ||
| upload_size_image: int = 10 | ||
| upload_size_pdf: int = 10 | ||
| upload_size_xlsx: int = 2 | ||
| upload_size_favicon: int = 1 | ||
| upload_size_attachment: int = 10 | ||
| upload_size_mail: int = 4 | ||
| upload_size_question: int = 20 | ||
| upload_size_other: int = 10 | ||
| response_size_webhook: int = 1 | ||
| webhook_timeout: int = 30 |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
|
Done with review and made the necessary changes as suggested by copilot. |
Summary
Fixes webhook reliability issues in
app/eventyay/api/webhooks.py.Changes
1. Replace
breakwithcontinueinnotify_webhooks(fixes #3400)breakwould exit the entire loop, silently dropping all remaining entries in the batchcontinueto skip only the problematic entry and process the restlogger.debug()calls for skipped entries to aid troubleshooting2. Add 30-second timeout to
requests.post()insend_webhook(fixes #3398)WEBHOOK_TIMEOUT = 30seconds constant3. Add explicit
success=Falsein error pathRequestExceptionhandler was creatingWebHookCallrecords without explicitly settingsuccess=False, relying on the model defaultsuccessfield for consistencyFiles Changed
app/eventyay/api/webhooks.py— 22 insertions, 3 deletionsRelated Issues
notify_webhooksUsesbreakInstead ofcontinue, Silently Dropping Webhook Notifications #3400Summary by Sourcery
Improve reliability and observability of outbound webhook delivery.
Bug Fixes: