Skip to content

Add disallow_doctype() configurator (no-DOCTYPE policy guard)#107

Merged
veewee merged 1 commit into
4.xfrom
feature/disallow-doctype-4.x
Jun 26, 2026
Merged

Add disallow_doctype() configurator (no-DOCTYPE policy guard)#107
veewee merged 1 commit into
4.xfrom
feature/disallow-doctype-4.x

Conversation

@veewee

@veewee veewee commented Jun 26, 2026

Copy link
Copy Markdown
Owner

What

Adds a disallow_doctype() DOM configurator that rejects any document carrying a <!DOCTYPE ...> declaration, throwing a dedicated VeeWee\Xml\Exception\DoctypeNotAllowedException. It runs on the parsed document, so it applies uniformly to every loader — string, file and node alike.

use VeeWee\Xml\Dom\Document;
use function VeeWee\Xml\Dom\Configurator\disallow_doctype;

Document::fromXmlString($untrustedXml, disallow_doctype());

What it protects against — and what it does not

Being precise here, because libxml already does most of the work. With the default options this library passes to Dom\XMLDocument (PHP 8.4, no extra libxml flags), libxml already:

  • does not read external file entities (file:///… is never resolved);
  • does not fetch external DTDs over the network (SYSTEM "http://…" triggers no request);
  • aborts entity-amplification ("billion-laughs") attacks before expanding them.

So on the default path you are already protected against the active XXE vectors, with or without this PR. The only thing libxml still expands are small, non-amplifying internal entities — which is not an attack when the attacker already authored the whole document.

What disallow_doctype() adds is therefore policy + defence-in-depth, not a libxml-gap fix:

  • a uniform, typed "no DOCTYPE allowed" rule across all loaders (useful where DOCTYPE is forbidden outright — SOAP / WS-Security / SAML);
  • protection if libxml's defaults ever change across versions, builds or php.ini.

What it does NOT protect against: it runs after parsing, so it cannot undo anything done during parsing. If the loader is given unsafe options — LIBXML_NOENT or LIBXML_DTDLOAD — libxml reads the file / fetches the DTD while parsing, before this configurator runs. The rejection is then too late. Verified on PHP 8.4: LIBXML_NOENT substitutes a file:/// entity into the DOM during parse.

Takeaway: keeping the default options is what actually keeps you safe; do not enable LIBXML_NOENT / LIBXML_DTDLOAD for untrusted input. This configurator is the policy layer on top.

How

A configurator throwing when $document->doctype !== null, composed with the existing loaders (fromXmlString, fromXmlFile, …). Smallest surface; one reusable primitive that works for every input source. A pre-parse string guard / options-owning "secure loader" was intentionally not added — it would only cover the string input path and would not generalise to files, streams or already-parsed nodes; the option-hygiene point above is the real lever and it is universal.

Tests

  • bare <!DOCTYPE r>
  • external file entity (file:///…)
  • internal entity substitution (non-amplifying — libxml parses it, configurator rejects it)
  • external DTD over network (SYSTEM "http://…")
  • well-formed doc without a DOCTYPE still loads

Notes

  • Exception design: DoctypeNotAllowedException extends global \RuntimeException and implements VeeWee\Xml\Exception\ExceptionInterface, mirroring the existing EncodingException precedent. It can't extend VeeWee\Xml\Exception\RuntimeException (that class is final); the shared catchable contract is ExceptionInterface.

✅ Full suite (693 tests) · Psalm clean (100% inferred) · php-cs-fixer clean

@veewee veewee force-pushed the feature/disallow-doctype-4.x branch from 7c5b187 to 5d9ce28 Compare June 26, 2026 06:34
Loading untrusted XML through the Dom layer offered no policy against
DOCTYPE-bearing documents. On PHP 8.4, Dom\XMLDocument::createFromString
does not fetch external entities by default and libxml already aborts
entity-amplification (billion-laughs) attacks itself, but it does parse a
DOCTYPE and expand small internal entities.

This adds a disallow_doctype() configurator that rejects any document whose
doctype is present, throwing a dedicated DoctypeNotAllowedException. It is a
defence-in-depth policy (refuse the DOCTYPE surface wholesale) and catches
the internal-entity / external-DTD-reference cases libxml permits, before
downstream code uses the document:

    Document::fromXmlString($untrustedXml, disallow_doctype());

It is not a DoS mitigation: a post-parse check cannot prevent parse-time
expansion, and libxml already handles amplification.
@veewee veewee force-pushed the feature/disallow-doctype-4.x branch from 5d9ce28 to 9dfb089 Compare June 26, 2026 06:51
@veewee veewee changed the title Add disallow_doctype() configurator to harden against XXE Add disallow_doctype() configurator (no-DOCTYPE policy guard) Jun 26, 2026
@veewee veewee merged commit 1ea4b37 into 4.x Jun 26, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant