Skip to content

Fix critical SSH connection leak and task deduplication false-positive#1353

Open
kartikbhartiya wants to merge 1 commit into
openwisp:masterfrom
kartikbhartiya:master
Open

Fix critical SSH connection leak and task deduplication false-positive#1353
kartikbhartiya wants to merge 1 commit into
openwisp:masterfrom
kartikbhartiya:master

Conversation

@kartikbhartiya
Copy link
Copy Markdown

PR: Fix critical SSH connection leak and task deduplication false-positive

Summary

This PR fixes two critical bugs in the connection module that can cause SSH connection exhaustion on managed devices and silently dropped configuration updates.


Bug 1: SSH connection leak in DeviceConnection.update_config()

File: openwisp_controller/connection/base/models.py

Severity: 🔴 Critical — causes resource exhaustion on managed devices

Problem

When connector_instance.update_config() raises an exception (e.g. SSH command failure, network timeout), disconnect() is never called because it was placed in the else branch of the try/except:

# BEFORE (broken)
def update_config(self):
    self.connect()
    if self.is_working:
        try:
            self.connector_instance.update_config()
        except Exception as e:
            logger.exception(e)
        else:                    # ← only runs on SUCCESS
            self.disconnect()    # ← connection leaked on failure

Every failed config push leaks an open SSH connection. Over time, this exhausts the SSH connection limit (MaxSessions / MaxStartups) on managed OpenWrt devices, causing all subsequent SSH operations to fail — including manual access by administrators.

Fix

Move disconnect() from else to finally so the connection is always cleaned up:

-            else:
+            finally:
                 self.disconnect()

Bug 2: False-positive in concurrent update_config task deduplication

File: openwisp_controller/connection/tasks.py

Severity: 🔴 Critical — silently drops legitimate configuration updates

Problem

The _is_update_in_progress() function checks whether another Celery task is already updating a device's config. It uses a substring match to find the device ID in the task's args:

# BEFORE (broken)
str(device_id) in task["args"]   # ← substring match on string repr!

task["args"] is a string representation like "('550e8400-e29b-41d4-a716-446655440000',)". The in operator performs a substring search, meaning:

  • Device "4466" would match a task running for device "550e8400-e29b-41d4-a716-446655440000" because "4466" appears as a substring
  • This causes _is_update_in_progress() to return True incorrectly
  • The config update for the wrong device is silently dropped

This is especially dangerous with UUID-based IDs where partial hex collisions are plausible at scale.

Fix

Parse the args list and compare device IDs with strict equality instead of substring matching:

+    device_id_str = str(device_id)
     ...
-                and str(device_id) in task["args"]
+                and device_id_str in [str(arg) for arg in task.get("args", [])]

Files Changed

File Change
models.py elsefinally for disconnect() in update_config()
tasks.py Substring match → exact element match in _is_update_in_progress()

Testing

Both fixes are backward-compatible and require no migration. Existing tests should continue to pass since:

  • The disconnect() call now runs in more scenarios (superset of previous behavior)
  • The task deduplication check is now more precise (fewer false positives)

Copilot AI review requested due to automatic review settings May 14, 2026 01:53
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 14, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR contains two targeted fixes to the connection management system. The first change ensures that device connections are properly disconnected even when configuration updates fail, by moving the disconnect call into a finally block. The second change improves task detection logic to correctly identify concurrent update_config tasks by comparing stringified device identifiers against stringified task arguments, rather than relying on direct containment checks.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • openwisp/openwisp-controller#1292: Both PRs modify openwisp_controller/connection/tasks.py, specifically the _is_update_in_progress duplicate-detection logic around Celery update_config tasks.

Suggested reviewers

  • nemesifier

Caution

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

  • Ignore

❌ Failed checks (1 error, 1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Bug Fixes ❌ Error Regression tests for both bugs are insufficient. Bug 1 lacks any test validating disconnect is called on exception. Bug 2's tests don't reproduce the UUID substring collision scenario. Add test mocking connector_instance.update_config raising exception and verify disconnect is called. Add test for _is_update_in_progress with UUID substring collision (one ID substring of another).
Title check ⚠️ Warning The pull request title lacks the required [type] prefix (e.g., [fix], [feature]) as specified in the repository's title requirements. Update the title to follow the format: [fix] Fix critical SSH connection leak and task deduplication false-positive
Description check ❓ Inconclusive The pull request description is comprehensive and well-structured, covering the problem, fix, and testing approach, though it does not reference a specific issue number as required by the template. Add a 'Reference to Existing Issue' section with 'Closes #' to complete the required template checklist format.
✅ Passed checks (2 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses two bugs in the connection module: an SSH connection leak when update_config() fails, and a brittleness in the concurrent-task deduplication check.

Changes:

  • In AbstractDeviceConnection.update_config(), move self.disconnect() from the else branch to a finally branch so the connection is released even when the connector raises.
  • In _is_update_in_progress(), replace str(device_id) in task["args"] with an explicit element-wise comparison against stringified args, also defensively using task.get("args", []).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
openwisp_controller/connection/base/models.py Switch else to finally to ensure disconnect() always runs after update_config() attempts.
openwisp_controller/connection/tasks.py Use list membership of stringified args instead of a containment check on task["args"] to avoid false positives.

Notes on the tasks.py change: the existing tests in tests/test_tasks.py mock args as a list (e.g. "args": [device_id]), so the previous str(device_id) in task["args"] was already doing list-membership (not substring matching) under tests. The PR description's example of substring matching on a string-repr only applies if Celery returns args as a string in some configurations. The new code is still strictly safer and remains compatible with the existing tests, but the description slightly overstates the original bug for the in-repo behavior.

The models.py change is a clear, well-scoped fix. Both changes are minimal and backward compatible.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@openwisp-companion
Copy link
Copy Markdown

Dependency Conflict in openwisp-controller

Hello @kartikbhartiya,
(Analysis for commit 5b0138a)

There is a dependency conflict between openwisp-controller and openwisp-ipam due to incompatible versions of django-reversion.

  • openwisp-controller requires django-reversion~=6.0.0.
  • openwisp-ipam requires django-reversion~=6.1.0.

To resolve this, please update the version specifier for django-reversion in your setup.py or requirements.txt to a version that satisfies both dependencies, or remove the specific version constraints to allow pip to find a compatible version.

@kilo-code-bot
Copy link
Copy Markdown

kilo-code-bot Bot commented May 14, 2026

Code Review Summary

Status: 2 Issues Found | Recommendation: Address before merge

Overview

Severity Count
CRITICAL 2
WARNING 0
SUGGESTION 0

Both code fixes are correct and important, but this PR is missing required regression tests per project policy.

Issue Details (click to expand)

CRITICAL

  1. Missing regression test for connection leak fix

    • File: openwisp_controller/connection/base/models.py
    • Line: 372 (the finally: change)
    • Issue: Bug fixes must include a regression test that fails without the patch and passes with it. There is no test verifying that disconnect() is called when update_config() raises an exception.
  2. Missing regression test for substring matching fix

    • File: openwisp_controller/connection/tasks.py
    • Line: 30 (the list comprehension change)
    • Issue: No test verifies that partial device ID strings don't cause false positives in _is_update_in_progress(). A test should demonstrate that device ID "4466" does NOT match a task with args containing "550e8400-e29b-41d4-a716-446655440000".
Code Quality Assessment

Bug 1 Fix (else → finally): ✅ Correct

  • Moving disconnect() from else to finally ensures connections are always cleaned up, preventing SSH connection exhaustion on managed devices.

Bug 2 Fix (substring → exact match): ✅ Correct

  • Changing from str(device_id) in task["args"] to exact element matching prevents false positives when device IDs share substrings (e.g., UUID partial matches).

Policy Compliance: ❌ Missing regression tests

  • Per the project policy, bug fixes must include regression tests.
  • The existing tests in test_tasks.py and test_models.py do not cover these specific failure scenarios.
Files Reviewed (2 files)
  • openwisp_controller/connection/base/models.py - Connection leak fix, missing regression test
  • openwisp_controller/connection/tasks.py - Substring matching fix, missing regression test
  • openwisp_controller/connection/tests/test_tasks.py - Reviewed existing test coverage
  • openwisp_controller/connection/tests/test_models.py - Reviewed existing test coverage

Fix these issues in Kilo Cloud


Reviewed by kimi-k2.5-0127 · 172,175 tokens

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
openwisp_controller/connection/tasks.py (1)

19-34: 🧹 Nitpick | 🔵 Trivial | 💤 Low value

Consider documenting the task signature assumption.

The check device_id_str in [str(arg) for arg in task.get("args", [])] works correctly for the current single-argument task signature (update_config(self, device_id)). If the task signature ever changes to include additional arguments, this check may need refinement to compare against a specific argument position (e.g., task.get("args", [])[0]) rather than using in on the entire args list.

Adding a brief inline comment noting this assumption would improve future maintainability.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@openwisp_controller/connection/tasks.py` around lines 19 - 34, The
_is_update_in_progress function assumes the Celery task args list contains the
device_id as a positional argument (currently the only/first arg); add a concise
inline comment near the line with device_id_str in [str(arg) for arg in
task.get("args", [])] stating this assumption and advising that if the task
signature changes (e.g., extra args), the check should be updated to compare the
specific positional index (task.get("args", [])[0]) or use a named-arg
inspection instead; reference _is_update_in_progress and _TASK_NAME in the
comment so future maintainers can find and update the logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@openwisp_controller/connection/tasks.py`:
- Around line 19-34: The _is_update_in_progress function assumes the Celery task
args list contains the device_id as a positional argument (currently the
only/first arg); add a concise inline comment near the line with device_id_str
in [str(arg) for arg in task.get("args", [])] stating this assumption and
advising that if the task signature changes (e.g., extra args), the check should
be updated to compare the specific positional index (task.get("args", [])[0]) or
use a named-arg inspection instead; reference _is_update_in_progress and
_TASK_NAME in the comment so future maintainers can find and update the logic.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 264a444f-f8c5-4bc5-a17d-772345e25ba3

📥 Commits

Reviewing files that changed from the base of the PR and between 868f1cd and 5b0138a.

📒 Files selected for processing (2)
  • openwisp_controller/connection/base/models.py
  • openwisp_controller/connection/tasks.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: Agent
  • GitHub Check: Python==3.10 | django~=5.1.0
  • GitHub Check: Python==3.13 | django~=5.1.0
  • GitHub Check: Python==3.12 | django~=5.2.0
  • GitHub Check: Python==3.10 | django~=4.2.0
  • GitHub Check: Python==3.12 | django~=4.2.0
  • GitHub Check: Python==3.11 | django~=5.2.0
  • GitHub Check: Python==3.13 | django~=5.2.0
  • GitHub Check: Python==3.12 | django~=5.1.0
  • GitHub Check: Python==3.10 | django~=5.2.0
  • GitHub Check: Python==3.11 | django~=5.1.0
  • GitHub Check: Python==3.11 | django~=4.2.0
  • GitHub Check: Kilo Code Review
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{js,ts,tsx,jsx,py,java,go,cs,rb,php,c,cpp,h,hpp}

📄 CodeRabbit inference engine (Custom checks)

**/*.{js,ts,tsx,jsx,py,java,go,cs,rb,php,c,cpp,h,hpp}: Flag potential security vulnerabilities in code
Avoid unnecessary comments or docstrings for code that is already clear
Code formatting is compact and readable. Do not add excessive blank lines, especially inside function or method bodies
Flag unused or redundant code
Ensure variables, functions, classes, and files have descriptive and consistent names
New code must handle errors properly: log errors that cannot be resolved by the user with error level, log unusual conditions with warning level, log important background actions with info level, and provide user-facing messages for errors that the user can solve autonomously

Files:

  • openwisp_controller/connection/tasks.py
  • openwisp_controller/connection/base/models.py
**/*.{js,ts,tsx,jsx,py,java,go,cs,rb,php,c,cpp,h,hpp,sql}

📄 CodeRabbit inference engine (Custom checks)

Flag obvious performance regressions, such as heavy loops, repeated I/O, or unoptimized queries

Files:

  • openwisp_controller/connection/tasks.py
  • openwisp_controller/connection/base/models.py
**/*.{js,ts,tsx,jsx,py,java,go,cs,rb,php,c,cpp,h,hpp,sh,bash,sql}

📄 CodeRabbit inference engine (Custom checks)

Cryptic or non-obvious code (regex, complex bash commands, or hard-to-read code) must include a concise comment explaining why it is needed and why the complexity is acceptable

Files:

  • openwisp_controller/connection/tasks.py
  • openwisp_controller/connection/base/models.py
**/*.{py,html}

📄 CodeRabbit inference engine (Custom checks)

For Django pull requests, ensure all user-facing strings are marked as translatable using the Django i18n framework

Files:

  • openwisp_controller/connection/tasks.py
  • openwisp_controller/connection/base/models.py
🧠 Learnings (3)
📚 Learning: 2026-01-15T15:05:49.557Z
Learnt from: DragnEmperor
Repo: openwisp/openwisp-controller PR: 1175
File: openwisp_controller/config/management/commands/clear_last_ip.py:38-42
Timestamp: 2026-01-15T15:05:49.557Z
Learning: In Django projects, when using select_related() to traverse relations (for example, select_related("organization__config_settings")), the traversed relation must not be deferred. If you also use .only() in the same query, include the relation name or FK field (e.g., "organization" or "organization_id") in the .only() list to avoid the error "Field X cannot be both deferred and traversed using select_related at the same time." Apply this guideline to Django code in openwisp_controller/config/management/commands/clear_last_ip.py and similar modules by ensuring any select_related with an accompanying only() includes the related field names to prevent deferred/traversed conflicts.

Applied to files:

  • openwisp_controller/connection/tasks.py
  • openwisp_controller/connection/base/models.py
📚 Learning: 2026-02-17T19:13:10.088Z
Learnt from: nemesifier
Repo: openwisp/openwisp-controller PR: 1175
File: openwisp_controller/config/whois/commands.py:0-0
Timestamp: 2026-02-17T19:13:10.088Z
Learning: In reviews for the openwisp/openwisp-controller repository, do not propose changes based on Ruff warnings. The project does not use Ruff as its linter; ignore Ruff-related suggestions and follow the repository’s established linting and configuration rules. This guidance applies to all Python files under the openwisp_controller directory.

Applied to files:

  • openwisp_controller/connection/tasks.py
  • openwisp_controller/connection/base/models.py
📚 Learning: 2026-01-15T15:07:17.354Z
Learnt from: DragnEmperor
Repo: openwisp/openwisp-controller PR: 1175
File: openwisp_controller/geo/estimated_location/tests/tests.py:172-175
Timestamp: 2026-01-15T15:07:17.354Z
Learning: In this repository, flake8 enforces E501 (line too long) via setup.cfg (max-line-length = 88) while ruff ignores E501 via ruff.toml. Therefore, use '# noqa: E501' on lines that intentionally exceed 88 characters to satisfy flake8 without affecting ruff checks. This applies to Python files across the project (any .py) and is relevant for tests as well. Use sparingly and only where breaking lines is not feasible without hurting readability or functionality.

Applied to files:

  • openwisp_controller/connection/tasks.py
  • openwisp_controller/connection/base/models.py
🔇 Additional comments (2)
openwisp_controller/connection/base/models.py (1)

365-373: LGTM!

openwisp_controller/connection/tasks.py (1)

23-23: LGTM!

Also applies to: 30-30

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants