Skip to content

feat!: Gate pipeline deserialization through a module allowlist#11432

Draft
bogdankostic wants to merge 5 commits into
v3from
yaml_deserialization
Draft

feat!: Gate pipeline deserialization through a module allowlist#11432
bogdankostic wants to merge 5 commits into
v3from
yaml_deserialization

Conversation

@bogdankostic
Copy link
Copy Markdown
Contributor

Related Issues

Proposed Changes:

Pipeline.load / Pipeline.loads / Pipeline.from_dict used to dynamically import any class referenced in the YAML via importlib.import_module, which made a crafted YAML capable of causing arbitrary classes to be imported and instantiated (e.g. subprocess.Popen). This PR gates every import-by-name through an allowlist of trusted module namespaces.

Default allowlist: haystack, haystack_integrations, haystack_experimental, builtins, typing, collections.

Four ways to extend it, in increasing scope:

Scope API
Per call Pipeline.load(fp, allowed_modules=["mypkg.*"])
Per call (bypass) Pipeline.load(fp, unsafe=True)
Process-wide from haystack.core.serialization import allow_deserialization_module
Environment HAYSTACK_DESERIALIZATION_ALLOWLIST="mypkg.*,otherpkg.*"

The gate is wired into every string-to-class entry point:

  • import_class_by_name (used by default_from_dict for nested types)
  • deserialize_type (type annotations)
  • deserialize_callable (function references)
  • Pipeline.from_dict's component-type lookup

The per-call kwargs are propagated to all functions in the deserialization chain via a ContextVar (_DeserializationContext), so existing signatures don't change.

Defense in depth — parameter-name check: default_from_dict now refuses to recurse into nested {"type": "..."} dictionaries whose key is not an __init__ parameter of the parent class. This blocks YAML that smuggles classes into unused parameter slots — even classes on the allowlist can't be instantiated.

How did you test it?

New test file test/core/test_serialization_security.py (39 tests) covering:

  • Pattern-matching semantics for _module_matches
  • The default allowlist (haystack, typing, collections, builtins) and rejection of common attack-surface modules (subprocess, os).
  • All four extension mechanisms (per-call kwarg, process-wide, env var, unsafe=True bypass) — both that they enable the right modules and that they reset cleanly.
  • Propagation through Pipeline.load / loads / from_dict

Updated existing tests in test/core/test_serialization.py, test/core/pipeline/test_pipeline_base.py, four test_*_nonexisting_docstore tests in test/components/, and test/utils/test_callable_serialization.py to use modules that pass the allowlist where the original intent was to test the
import-failure path.

Test infrastructure (test/conftest.py): the autouse fixture extends the process-wide allowlist with test_*, *.test_*, test.*, pydantic, and httpx — i.e. only the modules existing tests legitimately reference.

Notes for the reviewer

  • MIGRATION.md has a copy-pasteable entry covering the four extension paths.

Checklist

  • I have read the contributors guidelines and the code of conduct.
  • I have updated the related issue with new insights and changes.
  • I have added unit tests and updated the docstrings.
  • I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test: and added ! in case the PR includes breaking changes.
  • I have documented my code.
  • I have added a release note file, following the contributors guidelines.
  • I have run pre-commit hooks and fixed any issue.

@vercel
Copy link
Copy Markdown

vercel Bot commented May 29, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
haystack-docs Ignored Ignored Preview May 30, 2026 10:31am

Request Review

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 29, 2026

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  haystack/core
  serialization.py 343-344
  serialization_security.py
  haystack/core/pipeline
  base.py
  haystack/utils
  callable_serialization.py
  type_serialization.py 195
Project Total  

This report was generated by python-coverage-comment-action

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant