Fix critical SSH connection leak and task deduplication false-positive#1353
Fix critical SSH connection leak and task deduplication false-positive#1353kartikbhartiya wants to merge 1 commit into
Conversation
📝 WalkthroughWalkthroughThis PR contains two targeted fixes to the connection management system. The first change ensures that device connections are properly disconnected even when configuration updates fail, by moving the disconnect call into a finally block. The second change improves task detection logic to correctly identify concurrent update_config tasks by comparing stringified device identifiers against stringified task arguments, rather than relying on direct containment checks. Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested reviewers
Caution Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional.
❌ Failed checks (1 error, 1 warning, 1 inconclusive)
✅ Passed checks (2 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR addresses two bugs in the connection module: an SSH connection leak when update_config() fails, and a brittleness in the concurrent-task deduplication check.
Changes:
- In
AbstractDeviceConnection.update_config(), moveself.disconnect()from theelsebranch to afinallybranch so the connection is released even when the connector raises. - In
_is_update_in_progress(), replacestr(device_id) in task["args"]with an explicit element-wise comparison against stringified args, also defensively usingtask.get("args", []).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| openwisp_controller/connection/base/models.py | Switch else to finally to ensure disconnect() always runs after update_config() attempts. |
| openwisp_controller/connection/tasks.py | Use list membership of stringified args instead of a containment check on task["args"] to avoid false positives. |
Notes on the tasks.py change: the existing tests in tests/test_tasks.py mock args as a list (e.g. "args": [device_id]), so the previous str(device_id) in task["args"] was already doing list-membership (not substring matching) under tests. The PR description's example of substring matching on a string-repr only applies if Celery returns args as a string in some configurations. The new code is still strictly safer and remains compatible with the existing tests, but the description slightly overstates the original bug for the in-repo behavior.
The models.py change is a clear, well-scoped fix. Both changes are minimal and backward compatible.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Dependency Conflict in
|
Code Review SummaryStatus: 2 Issues Found | Recommendation: Address before merge Overview
Both code fixes are correct and important, but this PR is missing required regression tests per project policy. Issue Details (click to expand)CRITICAL
Code Quality AssessmentBug 1 Fix (else → finally): ✅ Correct
Bug 2 Fix (substring → exact match): ✅ Correct
Policy Compliance: ❌ Missing regression tests
Files Reviewed (2 files)
Fix these issues in Kilo Cloud Reviewed by kimi-k2.5-0127 · 172,175 tokens |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
openwisp_controller/connection/tasks.py (1)
19-34: 🧹 Nitpick | 🔵 Trivial | 💤 Low valueConsider documenting the task signature assumption.
The check
device_id_str in [str(arg) for arg in task.get("args", [])]works correctly for the current single-argument task signature (update_config(self, device_id)). If the task signature ever changes to include additional arguments, this check may need refinement to compare against a specific argument position (e.g.,task.get("args", [])[0]) rather than usinginon the entire args list.Adding a brief inline comment noting this assumption would improve future maintainability.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@openwisp_controller/connection/tasks.py` around lines 19 - 34, The _is_update_in_progress function assumes the Celery task args list contains the device_id as a positional argument (currently the only/first arg); add a concise inline comment near the line with device_id_str in [str(arg) for arg in task.get("args", [])] stating this assumption and advising that if the task signature changes (e.g., extra args), the check should be updated to compare the specific positional index (task.get("args", [])[0]) or use a named-arg inspection instead; reference _is_update_in_progress and _TASK_NAME in the comment so future maintainers can find and update the logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@openwisp_controller/connection/tasks.py`:
- Around line 19-34: The _is_update_in_progress function assumes the Celery task
args list contains the device_id as a positional argument (currently the
only/first arg); add a concise inline comment near the line with device_id_str
in [str(arg) for arg in task.get("args", [])] stating this assumption and
advising that if the task signature changes (e.g., extra args), the check should
be updated to compare the specific positional index (task.get("args", [])[0]) or
use a named-arg inspection instead; reference _is_update_in_progress and
_TASK_NAME in the comment so future maintainers can find and update the logic.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 264a444f-f8c5-4bc5-a17d-772345e25ba3
📒 Files selected for processing (2)
openwisp_controller/connection/base/models.pyopenwisp_controller/connection/tasks.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: Agent
- GitHub Check: Python==3.10 | django~=5.1.0
- GitHub Check: Python==3.13 | django~=5.1.0
- GitHub Check: Python==3.12 | django~=5.2.0
- GitHub Check: Python==3.10 | django~=4.2.0
- GitHub Check: Python==3.12 | django~=4.2.0
- GitHub Check: Python==3.11 | django~=5.2.0
- GitHub Check: Python==3.13 | django~=5.2.0
- GitHub Check: Python==3.12 | django~=5.1.0
- GitHub Check: Python==3.10 | django~=5.2.0
- GitHub Check: Python==3.11 | django~=5.1.0
- GitHub Check: Python==3.11 | django~=4.2.0
- GitHub Check: Kilo Code Review
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{js,ts,tsx,jsx,py,java,go,cs,rb,php,c,cpp,h,hpp}
📄 CodeRabbit inference engine (Custom checks)
**/*.{js,ts,tsx,jsx,py,java,go,cs,rb,php,c,cpp,h,hpp}: Flag potential security vulnerabilities in code
Avoid unnecessary comments or docstrings for code that is already clear
Code formatting is compact and readable. Do not add excessive blank lines, especially inside function or method bodies
Flag unused or redundant code
Ensure variables, functions, classes, and files have descriptive and consistent names
New code must handle errors properly: log errors that cannot be resolved by the user with error level, log unusual conditions with warning level, log important background actions with info level, and provide user-facing messages for errors that the user can solve autonomously
Files:
openwisp_controller/connection/tasks.pyopenwisp_controller/connection/base/models.py
**/*.{js,ts,tsx,jsx,py,java,go,cs,rb,php,c,cpp,h,hpp,sql}
📄 CodeRabbit inference engine (Custom checks)
Flag obvious performance regressions, such as heavy loops, repeated I/O, or unoptimized queries
Files:
openwisp_controller/connection/tasks.pyopenwisp_controller/connection/base/models.py
**/*.{js,ts,tsx,jsx,py,java,go,cs,rb,php,c,cpp,h,hpp,sh,bash,sql}
📄 CodeRabbit inference engine (Custom checks)
Cryptic or non-obvious code (regex, complex bash commands, or hard-to-read code) must include a concise comment explaining why it is needed and why the complexity is acceptable
Files:
openwisp_controller/connection/tasks.pyopenwisp_controller/connection/base/models.py
**/*.{py,html}
📄 CodeRabbit inference engine (Custom checks)
For Django pull requests, ensure all user-facing strings are marked as translatable using the Django i18n framework
Files:
openwisp_controller/connection/tasks.pyopenwisp_controller/connection/base/models.py
🧠 Learnings (3)
📚 Learning: 2026-01-15T15:05:49.557Z
Learnt from: DragnEmperor
Repo: openwisp/openwisp-controller PR: 1175
File: openwisp_controller/config/management/commands/clear_last_ip.py:38-42
Timestamp: 2026-01-15T15:05:49.557Z
Learning: In Django projects, when using select_related() to traverse relations (for example, select_related("organization__config_settings")), the traversed relation must not be deferred. If you also use .only() in the same query, include the relation name or FK field (e.g., "organization" or "organization_id") in the .only() list to avoid the error "Field X cannot be both deferred and traversed using select_related at the same time." Apply this guideline to Django code in openwisp_controller/config/management/commands/clear_last_ip.py and similar modules by ensuring any select_related with an accompanying only() includes the related field names to prevent deferred/traversed conflicts.
Applied to files:
openwisp_controller/connection/tasks.pyopenwisp_controller/connection/base/models.py
📚 Learning: 2026-02-17T19:13:10.088Z
Learnt from: nemesifier
Repo: openwisp/openwisp-controller PR: 1175
File: openwisp_controller/config/whois/commands.py:0-0
Timestamp: 2026-02-17T19:13:10.088Z
Learning: In reviews for the openwisp/openwisp-controller repository, do not propose changes based on Ruff warnings. The project does not use Ruff as its linter; ignore Ruff-related suggestions and follow the repository’s established linting and configuration rules. This guidance applies to all Python files under the openwisp_controller directory.
Applied to files:
openwisp_controller/connection/tasks.pyopenwisp_controller/connection/base/models.py
📚 Learning: 2026-01-15T15:07:17.354Z
Learnt from: DragnEmperor
Repo: openwisp/openwisp-controller PR: 1175
File: openwisp_controller/geo/estimated_location/tests/tests.py:172-175
Timestamp: 2026-01-15T15:07:17.354Z
Learning: In this repository, flake8 enforces E501 (line too long) via setup.cfg (max-line-length = 88) while ruff ignores E501 via ruff.toml. Therefore, use '# noqa: E501' on lines that intentionally exceed 88 characters to satisfy flake8 without affecting ruff checks. This applies to Python files across the project (any .py) and is relevant for tests as well. Use sparingly and only where breaking lines is not feasible without hurting readability or functionality.
Applied to files:
openwisp_controller/connection/tasks.pyopenwisp_controller/connection/base/models.py
🔇 Additional comments (2)
openwisp_controller/connection/base/models.py (1)
365-373: LGTM!openwisp_controller/connection/tasks.py (1)
23-23: LGTM!Also applies to: 30-30
PR: Fix critical SSH connection leak and task deduplication false-positive
Summary
This PR fixes two critical bugs in the
connectionmodule that can cause SSH connection exhaustion on managed devices and silently dropped configuration updates.Bug 1: SSH connection leak in
DeviceConnection.update_config()File:
openwisp_controller/connection/base/models.pySeverity: 🔴 Critical — causes resource exhaustion on managed devices
Problem
When
connector_instance.update_config()raises an exception (e.g. SSH command failure, network timeout),disconnect()is never called because it was placed in theelsebranch of thetry/except:Every failed config push leaks an open SSH connection. Over time, this exhausts the SSH connection limit (
MaxSessions/MaxStartups) on managed OpenWrt devices, causing all subsequent SSH operations to fail — including manual access by administrators.Fix
Move
disconnect()fromelsetofinallyso the connection is always cleaned up:Bug 2: False-positive in concurrent
update_configtask deduplicationFile:
openwisp_controller/connection/tasks.pySeverity: 🔴 Critical — silently drops legitimate configuration updates
Problem
The
_is_update_in_progress()function checks whether another Celery task is already updating a device's config. It uses a substring match to find the device ID in the task's args:task["args"]is a string representation like"('550e8400-e29b-41d4-a716-446655440000',)". Theinoperator performs a substring search, meaning:"4466"would match a task running for device"550e8400-e29b-41d4-a716-446655440000"because"4466"appears as a substring_is_update_in_progress()to returnTrueincorrectlyThis is especially dangerous with UUID-based IDs where partial hex collisions are plausible at scale.
Fix
Parse the args list and compare device IDs with strict equality instead of substring matching:
Files Changed
else→finallyfordisconnect()inupdate_config()_is_update_in_progress()Testing
Both fixes are backward-compatible and require no migration. Existing tests should continue to pass since:
disconnect()call now runs in more scenarios (superset of previous behavior)