feat!: Gate pipeline deserialization through a module allowlist#11432
Draft
bogdankostic wants to merge 5 commits into
Draft
feat!: Gate pipeline deserialization through a module allowlist#11432bogdankostic wants to merge 5 commits into
bogdankostic wants to merge 5 commits into
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. 1 Skipped Deployment
|
Contributor
Coverage reportClick to see where and how coverage changed
This report was generated by python-coverage-comment-action |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Related Issues
Proposed Changes:
Pipeline.load/Pipeline.loads/Pipeline.from_dictused to dynamically import any class referenced in the YAML viaimportlib.import_module, which made a crafted YAML capable of causing arbitrary classes to be imported and instantiated (e.g.subprocess.Popen). This PR gates every import-by-name through an allowlist of trusted module namespaces.Default allowlist:
haystack,haystack_integrations,haystack_experimental,builtins,typing,collections.Four ways to extend it, in increasing scope:
Pipeline.load(fp, allowed_modules=["mypkg.*"])Pipeline.load(fp, unsafe=True)from haystack.core.serialization import allow_deserialization_moduleHAYSTACK_DESERIALIZATION_ALLOWLIST="mypkg.*,otherpkg.*"The gate is wired into every string-to-class entry point:
import_class_by_name(used bydefault_from_dictfor nested types)deserialize_type(type annotations)deserialize_callable(function references)Pipeline.from_dict's component-type lookupThe per-call kwargs are propagated to all functions in the deserialization chain via a
ContextVar(_DeserializationContext), so existing signatures don't change.Defense in depth — parameter-name check:
default_from_dictnow refuses to recurse into nested{"type": "..."}dictionaries whose key is not an__init__parameter of the parent class. This blocks YAML that smuggles classes into unused parameter slots — even classes on the allowlist can't be instantiated.How did you test it?
New test file
test/core/test_serialization_security.py(39 tests) covering:_module_matcheshaystack,typing,collections,builtins) and rejection of common attack-surface modules (subprocess,os).unsafe=Truebypass) — both that they enable the right modules and that they reset cleanly.Pipeline.load/loads/from_dictUpdated existing tests in
test/core/test_serialization.py,test/core/pipeline/test_pipeline_base.py, fourtest_*_nonexisting_docstoretests intest/components/, andtest/utils/test_callable_serialization.pyto use modules that pass the allowlist where the original intent was to test theimport-failure path.
Test infrastructure (
test/conftest.py): the autouse fixture extends the process-wide allowlist withtest_*,*.test_*,test.*,pydantic, andhttpx— i.e. only the modules existing tests legitimately reference.Notes for the reviewer
MIGRATION.mdhas a copy-pasteable entry covering the four extension paths.Checklist
fix:,feat:,build:,chore:,ci:,docs:,style:,refactor:,perf:,test:and added!in case the PR includes breaking changes.