From ccf504656860688e5ce5ffab3be2493905cd5c19 Mon Sep 17 00:00:00 2001 From: Jarek Potiuk Date: Tue, 2 Jun 2026 19:42:39 +0200 Subject: [PATCH 1/7] docs: add draft threat model + SECURITY.md/AGENTS.md discoverability Generated-by: Claude Code --- AGENTS.md | 7 ++ SECURITY.md | 29 ++++++++ THREAT_MODEL.md | 179 ++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 215 insertions(+) create mode 100644 SECURITY.md create mode 100644 THREAT_MODEL.md diff --git a/AGENTS.md b/AGENTS.md index 02e44f69a3..1effbddaca 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -177,3 +177,10 @@ This is the entry point for AI guidance in Apache Fory. Read this file first, th - PR titles must follow Conventional Commits; `.github/workflows/pr-lint.yml` enforces this. - Performance changes should use the `perf` type and include benchmark data. - See `.agents/ci-and-pr.md` for GitHub CLI triage commands and commit message examples. + +## Security + +Security model: [SECURITY.md](./SECURITY.md) + +Agents that scan this repository should consult `SECURITY.md` and the +threat model it links before reporting issues. diff --git a/SECURITY.md b/SECURITY.md new file mode 100644 index 0000000000..bef2df1b35 --- /dev/null +++ b/SECURITY.md @@ -0,0 +1,29 @@ + + +# Security Policy + +## Reporting a Vulnerability + +`apache/fory` follows the [Apache Software Foundation security process](https://www.apache.org/security/). Please report suspected +vulnerabilities privately to `security@apache.org`; do not open public +GitHub issues or pull requests for security reports. + +## Threat Model + +What the project treats as in scope and out of scope, the security +properties it provides and disclaims, the adversary model, and how +findings are triaged are documented in [THREAT_MODEL.md](./THREAT_MODEL.md). diff --git a/THREAT_MODEL.md b/THREAT_MODEL.md new file mode 100644 index 0000000000..8f936266fd --- /dev/null +++ b/THREAT_MODEL.md @@ -0,0 +1,179 @@ + + +# Apache Fory — Threat Model (v0 draft) + +## §1 Header + +- **Project:** Apache Fory (`apache/fory`), `main`, against which this draft was written. Fory is a multi-language serialization framework (Java, C++, Python, Go, Rust, JavaScript, Kotlin, Scala, Swift, Dart, C#). +- **Date:** 2026-06-02. **Status:** draft — for Apache Fory PMC review. **Author:** ASF Security team (drafted via the Scovetta threat-model rubric), for PMC ratification. +- **Version binding:** versioned with the project; a report against Fory version *N* is triaged against the model as it stood at *N*, not at HEAD. +- **Reporting cross-reference:** findings that violate a §8 property should be reported privately per the ASF process (`security@apache.org` → `private@fory.apache.org`); findings under §3 or §9 are closed citing this document. +- **Provenance legend:** *(documented)* = stated in Fory's own docs/repo; *(maintainer)* = confirmed by a Fory PMC member through this process; *(inferred)* = reasoned from architecture/domain knowledge, not yet confirmed — every *(inferred)* claim has a matching §14 open question. +- **Draft confidence:** ~20 documented / 0 maintainer / ~26 inferred. +- **What Fory is:** Apache Fory is a high-performance, multi-language object/data serialization framework. An application uses it in-process to serialize its objects to bytes and deserialize bytes back into objects, either within one language ("native" mode) or across languages ("xlang" mode), with optional zero-copy and a row format. *(documented — README, docs/guide)* + +## §2 Scope and intended use + +- **Primary use:** an **in-process library** linked into a host application that calls `serialize()` / `deserialize()` on its own data types. *(documented — guides)* +- **It is not a network service or daemon.** It has no listening surface, no auth, no users — the embedding application owns where the bytes come from and go. *(inferred)* +- **Caller / trust level:** a single caller — the embedding application — which is **trusted** (it links the library and registers its types). The security-relevant question is not "who calls Fory" but **"where do the bytes handed to `deserialize()` come from"** — trusted producer, or attacker-controlled. *(inferred; the registration guidance is documented)* + +**Component-family table** *(in/out of this model):* + +| Family | Entry point | Notes | In model? | +| --- | --- | --- | --- | +| Object-graph serialization (native, per language) | `fory.serialize` / `deserialize` | the core; instantiates registered types from bytes | **In** *(documented)* | +| Cross-language (xlang) serialization | xlang `serialize`/`deserialize` | type mapping across languages | **In** *(documented)* | +| Row format / zero-copy | row encoders | reads fields in place from a buffer | **In** *(documented)* | +| Class/type registration + "secure mode" | `requireClassRegistration`, `register(...)` | the primary defense | **In** *(documented)* | +| Per-language implementations | `java/`, `cpp/`, `python/`, `go/`, `rust/`, `javascript/`, `kotlin/`, `scala/`, `swift/`, `dart/`, `csharp/` | each is a separate impl of the same model | **In** — but memory-safety profile differs by language (see §5/§8) *(documented: dirs exist)* | +| `examples/`, `benchmarks/`, `integration_tests/` | demo/bench/test | not production surface | **Out** *(see §3)* | + +## §3 Out of scope (explicit non-goals) + +- **The integrity / authenticity / confidentiality of the serialized bytes.** Fory deserializes what it is given; it does not authenticate, MAC, or encrypt payloads. If bytes can be tampered with in transit/at rest, that is the application's problem to solve (sign/encrypt before handing to Fory). *(inferred)* +- **Anything when the caller disables class registration on an untrusted payload source.** `requireClassRegistration(false)` is a documented, deliberately-available footgun; using it against attacker-controlled bytes is out of the model's protection (see §5a/§9). *(documented — config: "Disabling may allow unknown classes to be deserialized, potentially causing security risks")* +- **The behaviour of the application's own registered classes.** Fory instantiates and populates registered types; if a registered class has dangerous side effects in its constructors/setters/finalizers, that is the application's design, not Fory's. *(inferred)* +- **`examples/`, `benchmarks/`, `integration_tests/`** — shipped but not a production trust surface. *(inferred)* + +## §4 Trust boundaries and data flow + +- **The trust boundary is the byte buffer passed to `deserialize()`** (and the row-format buffer). Everything Fory does on the serialize side operates on the application's own in-memory objects (trusted); the deserialize side is where attacker-controlled bytes, if any, enter. *(inferred)* +- **Data flow:** untrusted bytes → format/header parse → (class id / type resolution → **registration check**) → field decode → object graph construction → returned to caller. The registration check is the gate that decides whether an arbitrary type may be instantiated. *(inferred; registration mechanism documented)* +- **Reachability precondition:** a deserialize-side finding is **in-model** only if it is reachable from the byte buffer under the **default secure configuration** (`requireClassRegistration(true)`). A finding that requires `requireClassRegistration(false)`, or that requires the *serialize* side to be fed attacker-controlled live objects, is out-of-model (§5a / trusted-input). *(inferred)* + +## §5 Assumptions about the environment + +- **In-process, no ambient I/O.** Fory does not (by design) open sockets, spawn processes, or read the network; it operates on in-memory buffers handed to it. *(inferred — high-priority confirmation; negative claim)* +- **Per-language memory model differs.** In managed runtimes (Java, Python, Go, JS, …) memory safety is the runtime's; in the **C++** (and unsafe-Rust FFI) paths, malformed input reaching the decoder is a memory-safety surface in a way it is not on the JVM. The model's "memory safety on malformed input" property is therefore language-conditional (see §8). *(inferred)* +- **Codegen / JIT:** on ordinary JVMs Fory generates serializer code at runtime (`codeGenEnabled` default true); disabled on Android / GraalVM native image. This is a performance mechanism over the application's own registered types, not a path for executing attacker bytes. *(documented — config table)* + +## §5a Build-time and configuration variants + +The security envelope is set by runtime configuration, not build flags. The load-bearing knobs *(documented — docs/guide/java/configuration.md)*: + +| Knob | Default | Effect on the model | +| --- | --- | --- | +| `requireClassRegistration` | **`true`** (secure) | When true, only registered types are deserialized — the primary defense against deserializing arbitrary/gadget classes. Disabling "may allow unknown classes to be deserialized, potentially causing security risks." | +| `maxDepth` | **`50`** | Bounds deserialization recursion depth; "can be used to refuse deserialization DDOS attack." | +| `deserializeUnknownClass` | `true` in compatible mode, else `false` | Whether data for unknown/non-existent classes is skipped/deserialized. | +| `compatible` | xlang: `true`; native: `false` | Schema forward/backward compatibility. | +| `suppressClassRegistrationWarnings` | `true` | Registration warnings are useful for security audit but suppressed by default. | + +**The default is the *secure* posture here** (registration required) — the inverse of the usual insecure-default case. The model's §8 properties hold *under the defaults*; a report that only manifests under `requireClassRegistration(false)` is `OUT-OF-MODEL: non-default-build`. Confirm this framing with the PMC (§14). + +## §6 Assumptions about inputs + +Per-entry-point trust table *(registration mechanism + defaults documented; trust framing inferred):* + +| Entry point | Input | Attacker-controllable? | Caller must enforce | +| --- | --- | --- | --- | +| `deserialize(bytes)` / `deserialize(bytes, Class)` | serialized byte buffer | **yes, if the application sources bytes from an untrusted producer** | keep `requireClassRegistration(true)`; register only safe types; integrity-check bytes upstream | +| row-format readers | buffer | **yes** (same as above) | same | +| `serialize(obj)` | a live application object | no — the app's own trusted object | n/a | +| `register(Class, …)` | type registered at setup | no — controlled by the app developer | register only types safe to instantiate from untrusted data | + +- **Size/shape/rate:** `maxDepth` (default 50) bounds nesting; whether total allocation / output size is otherwise bounded against a hostile payload is open (see §8 resource line). *(maxDepth documented; broader bound inferred)* + +## §7 Adversary model + +- **Primary adversary:** a party who controls the **serialized bytes** an application later passes to `deserialize()` (e.g. data arriving over a network the app feeds to Fory, or persisted data an attacker can tamper with). Goal: instantiate dangerous types (gadget-chain RCE), corrupt memory in the native paths, or exhaust CPU/memory. *(inferred — the canonical serialization-framework adversary)* +- **Capabilities:** can craft arbitrary/malformed byte buffers; cannot change the application's Fory configuration or its registered-type set (those are set by the trusted app at startup). *(inferred)* +- **Out of scope:** an attacker who controls the embedding application, its configuration, or the objects passed to `serialize()` — already trusted; an attacker who has set `requireClassRegistration(false)` themselves. *(inferred)* + +## §8 Security properties the project provides + +*(Registration + depth defenses documented; the guarantees framed below are for PMC confirmation.)* + +- **Registered-type-only instantiation (default).** With `requireClassRegistration(true)` (the default), deserialization instantiates only types the application registered, so attacker bytes cannot drive Fory to construct an arbitrary class. *Violation symptom:* an unregistered/unexpected type is instantiated from input under the default config. *Severity:* security-critical (this is the deserialization-RCE defense). *(documented that registration is required by default + that disabling causes risk; the unbypassability guarantee is the claim to confirm)* +- **Bounded recursion depth.** Deserialization beyond `maxDepth` (default 50) throws rather than recursing unbounded. *Violation symptom:* stack overflow / unbounded recursion from crafted nesting under the default. *Severity:* security-critical (DoS). *(documented — config table)* +- **Memory safety on malformed input — language-conditional.** In managed-runtime implementations, malformed bytes yield an exception, not memory corruption. For the **C++** implementation this is the load-bearing property to confirm (malformed-input fuzzing of the C++ decoder). *Violation symptom:* OOB read/write, crash. *Severity:* security-critical. *(inferred — confirm per language)* +- **Resource bounds beyond depth — UNSPECIFIED.** Whether a crafted payload can force large allocation / CPU blowup within the depth limit (e.g. huge declared collection sizes) is a bug or expected is open; the model needs a line (§14). *(inferred; maxDepth documented)* + +## §9 Security properties the project does *not* provide + +*(Highest-value section for integrators.)* + +- **No protection when class registration is disabled.** `requireClassRegistration(false)` deliberately allows deserializing unknown classes — using it on untrusted input re-opens the classic deserialization-gadget RCE surface. This is the caller's choice, documented as risky. *(documented — config)* +- **No payload authentication or confidentiality.** Fory does not verify that bytes came from a trusted producer or that they are unmodified; it is not a MAC, signature, or cipher. *(inferred)* **False friend:** a successful round-trip / schema-compatibility check is *not* an integrity guarantee against a malicious producer. +- **Not a sandbox for registered types.** Registering a class authorizes Fory to instantiate it from bytes; if that class's construction has side effects, Fory does not contain them. *(inferred)* +- **Cross-language type-confusion is the integrator's concern** in xlang mode — relying on the peer to send a compatible schema is a trust assumption between the two ends, not something Fory enforces against a hostile peer. *(inferred)* +- **Well-known classes left to the caller:** deserialization-gadget attacks (defended by registration, *if left on*), decompression/allocation bombs (partially bounded by `maxDepth`), and integrity attacks on the byte stream. *(inferred)* + +## §10 Downstream responsibilities (the embedding application) + +- **Keep `requireClassRegistration(true)`** whenever any deserialized bytes could be attacker-influenced (the documented production guidance). *(documented)* +- **Register only types that are safe to instantiate from untrusted data**; do not register types with dangerous construction side effects. *(inferred)* +- **Authenticate / integrity-check / decrypt** untrusted bytes *before* handing them to `deserialize()` — Fory will not. *(inferred)* +- **Tune `maxDepth`** to the application's real object depth rather than disabling it. *(inferred)* +- **In xlang mode, treat the peer's schema as a trust relationship** you control, not something Fory polices. *(inferred)* + +## §11 Known misuse patterns + +*(Draft one-liners — expand before publishing.)* + +- Setting `requireClassRegistration(false)` for convenience, then deserializing network/user data. *(documented as risky)* +- Treating Fory deserialization of untrusted bytes as safe without integrity-checking the bytes first. *(inferred)* +- Registering broad/dangerous types (or whole packages) to "make it work", widening the gadget surface. *(inferred)* +- Assuming the C++ decoder is as forgiving of malformed input as the JVM one. *(inferred)* + +## §11a Known non-findings (recurring false positives) + +*(Seed list — confirmations here are the highest-leverage scan-suppression input.)* + +- "Fory can deserialize arbitrary classes → RCE" — **only** with `requireClassRegistration(false)`; under the default (`true`) it cannot. A report that assumes registration is off is `OUT-OF-MODEL: non-default-build` unless the PMC says otherwise. *(documented)* +- "No signature/MAC/encryption on the serialized format" — by-design; integrity/confidentiality is the caller's (§9/§10). *(inferred)* +- "Unbounded recursion on nested input" — bounded by `maxDepth` (default 50). *(documented)* +- "Registered class X does something dangerous when constructed" — the application's registration choice (§3/§10), not a Fory bug. *(inferred)* +- "Reflection / dynamic codegen used at runtime" — `codeGenEnabled` operates over the app's own registered types, not attacker bytes (§5). *(documented config; framing inferred)* + +## §12 Conditions that would change this model + +- A change to the **default** of `requireClassRegistration` or `maxDepth`. *(documented knobs)* +- A new deserialization entry point or a new language implementation with a different memory-safety profile. *(inferred)* +- Fory gaining any I/O / network surface (it would stop being a pure in-process library). *(inferred)* +- A report that cannot be routed to a single §13 disposition → revise the model. + +## §13 Triage dispositions + +| Disposition | Meaning | Licensed by | +| --- | --- | --- | +| `VALID` | Violates a §8 property under the **default** config via attacker-controlled bytes (e.g. unregistered-type instantiation with registration on; unbounded recursion within maxDepth; C++ memory corruption on malformed input). | §8, §6, §7 | +| `VALID-HARDENING` | No §8 property broken, but a §11 misuse is easy enough to harden (e.g. a safer default, a louder warning). | §11 | +| `OUT-OF-MODEL: trusted-input` | Requires attacker control of the serialize-side objects, the registered-type set, or the Fory config. | §6, §7 | +| `OUT-OF-MODEL: non-default-build` | Only manifests with `requireClassRegistration(false)` or another discouraged §5a setting. | §5a | +| `OUT-OF-MODEL: unsupported-component` | Lands in `examples/`, `benchmarks/`, `integration_tests/`. | §3 | +| `BY-DESIGN: property-disclaimed` | Concerns a §9-disclaimed property (no payload auth/encryption, not a sandbox for registered types, xlang peer trust). | §9 | +| `KNOWN-NON-FINDING` | Matches a §11a entry. | §11a | +| `MODEL-GAP` | Cannot be cleanly routed — triggers a §12 revision. | §12 | + +## §14 Open questions for the maintainers + +**Wave 1 — scope & the registration framing:** +1. Confirm Fory is modeled as an **in-process library** with no ambient I/O (no sockets/processes/network) — the negative-side-effects inventory in §5. Proposed: yes. → §2/§5. +2. **The core ruling:** with `requireClassRegistration(true)` (default), is "only registered types are instantiated from untrusted bytes" a property Fory **commits to** (so a bypass is `VALID`/security-critical)? And is a finding that requires `requireClassRegistration(false)` correctly `OUT-OF-MODEL: non-default-build`? Proposed: yes to both. → §8/§5a/§13. +3. Confirm `examples/`/`benchmarks/`/`integration_tests/` are out of scope. → §3. + +**Wave 2 — language profiles & inputs:** +4. **Per-language memory safety:** for which implementations does Fory claim "malformed input → clean error, not memory corruption"? Is the **C++** decoder the primary memory-safety surface to fuzz, and does it carry the same guarantee? → §5/§8. +5. Beyond `maxDepth`, are there bounds on total allocation / declared collection sizes / output size against a hostile payload, or is that explicitly the caller's concern? Where is the resource/DoS line? → §8/§11a. +6. In **xlang** mode, what does Fory assume about the peer — is a hostile/malformed peer schema in scope, or is the peer a trusted endpoint? Proposed: peer trusted; type-confusion is the integrator's concern. → §7/§9. + +**Wave 3 — disclaimers & non-findings:** +7. Confirm Fory disclaims payload integrity/authenticity/confidentiality (no MAC/sig/encryption) and is not a sandbox for registered types' own logic. → §9. +8. Any other recurring scanner/fuzzer false positives the PMC already knows about, to seed §11a (e.g. reflection/Unsafe usage, codegen)? → §11a. +9. **Meta:** Fory has no in-repo `SECURITY.md` and an `AGENTS.md` that is a developer/agent guide. This engagement adds `SECURITY.md` + `THREAT_MODEL.md` and wires `AGENTS.md → SECURITY.md → THREAT_MODEL.md`. Confirm the model should live in-repo (as proposed) vs. on the website, and who owns revisions. The existing config-guide "Security" section becomes a pointer to this model. → §1. From 259936d433ceef32844285f5459dadc6a75b7815 Mon Sep 17 00:00:00 2001 From: Jarek Potiuk Date: Tue, 2 Jun 2026 21:17:48 +0200 Subject: [PATCH 2/7] Fix markdownlint MD032: blank line before the open-questions lists Generated-by: Claude Code --- THREAT_MODEL.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/THREAT_MODEL.md b/THREAT_MODEL.md index 8f936266fd..d388228c93 100644 --- a/THREAT_MODEL.md +++ b/THREAT_MODEL.md @@ -164,16 +164,19 @@ Per-entry-point trust table *(registration mechanism + defaults documented; trus ## §14 Open questions for the maintainers **Wave 1 — scope & the registration framing:** + 1. Confirm Fory is modeled as an **in-process library** with no ambient I/O (no sockets/processes/network) — the negative-side-effects inventory in §5. Proposed: yes. → §2/§5. 2. **The core ruling:** with `requireClassRegistration(true)` (default), is "only registered types are instantiated from untrusted bytes" a property Fory **commits to** (so a bypass is `VALID`/security-critical)? And is a finding that requires `requireClassRegistration(false)` correctly `OUT-OF-MODEL: non-default-build`? Proposed: yes to both. → §8/§5a/§13. 3. Confirm `examples/`/`benchmarks/`/`integration_tests/` are out of scope. → §3. **Wave 2 — language profiles & inputs:** + 4. **Per-language memory safety:** for which implementations does Fory claim "malformed input → clean error, not memory corruption"? Is the **C++** decoder the primary memory-safety surface to fuzz, and does it carry the same guarantee? → §5/§8. 5. Beyond `maxDepth`, are there bounds on total allocation / declared collection sizes / output size against a hostile payload, or is that explicitly the caller's concern? Where is the resource/DoS line? → §8/§11a. 6. In **xlang** mode, what does Fory assume about the peer — is a hostile/malformed peer schema in scope, or is the peer a trusted endpoint? Proposed: peer trusted; type-confusion is the integrator's concern. → §7/§9. **Wave 3 — disclaimers & non-findings:** + 7. Confirm Fory disclaims payload integrity/authenticity/confidentiality (no MAC/sig/encryption) and is not a sandbox for registered types' own logic. → §9. 8. Any other recurring scanner/fuzzer false positives the PMC already knows about, to seed §11a (e.g. reflection/Unsafe usage, codegen)? → §11a. 9. **Meta:** Fory has no in-repo `SECURITY.md` and an `AGENTS.md` that is a developer/agent guide. This engagement adds `SECURITY.md` + `THREAT_MODEL.md` and wires `AGENTS.md → SECURITY.md → THREAT_MODEL.md`. Confirm the model should live in-repo (as proposed) vs. on the website, and who owns revisions. The existing config-guide "Security" section becomes a pointer to this model. → §1. From 78ea160fb26b091f89c114122dbd969da42c850f Mon Sep 17 00:00:00 2001 From: chaokunyang Date: Wed, 17 Jun 2026 14:50:07 +0800 Subject: [PATCH 3/7] docs: align security threat model --- AGENTS.md | 9 +- SECURITY.md | 10 +- THREAT_MODEL.md | 182 ------------------------------- docs/security/deserialization.md | 37 ++++++- docs/security/index.md | 14 ++- docs/security/threat-model.md | 75 +++++++++++++ 6 files changed, 132 insertions(+), 195 deletions(-) delete mode 100644 THREAT_MODEL.md create mode 100644 docs/security/threat-model.md diff --git a/AGENTS.md b/AGENTS.md index 1effbddaca..04cc5ead39 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -9,6 +9,7 @@ This is the entry point for AI guidance in Apache Fory. Read this file first, th - `.agents/docs-and-formatting.md`: documentation, specification, and markdown rules. - `.agents/ci-and-pr.md`: CI triage, PR expectations, and commit conventions. - `.agents/testing/integration-tests.md`: `integration_tests/` prerequisites, regeneration rules, and commands. +- `docs/security/index.md`: security model index and threat model routing. - `docs/security/deserialization.md`: security boundaries for untrusted deserialization classification. - `.agents/languages/java.md` - `.agents/languages/csharp.md` @@ -180,7 +181,7 @@ This is the entry point for AI guidance in Apache Fory. Read this file first, th ## Security -Security model: [SECURITY.md](./SECURITY.md) - -Agents that scan this repository should consult `SECURITY.md` and the -threat model it links before reporting issues. +Security models start at `docs/security/index.md`. For untrusted +deserialization, read `docs/security/deserialization.md` before reporting or +changing allocation, stream filling, skip, reference, metadata, or policy +validation behavior. diff --git a/SECURITY.md b/SECURITY.md index bef2df1b35..2f663349a8 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -22,8 +22,10 @@ limitations under the License. vulnerabilities privately to `security@apache.org`; do not open public GitHub issues or pull requests for security reports. -## Threat Model +## Security Models -What the project treats as in scope and out of scope, the security -properties it provides and disclaims, the adversary model, and how -findings are triaged are documented in [THREAT_MODEL.md](./THREAT_MODEL.md). +Apache Fory security models are documented under +[docs/security](docs/security/). Start with the +[project threat model](docs/security/threat-model.md); for detailed untrusted +deserialization classification rules, see the +[deserialization security model](docs/security/deserialization.md). diff --git a/THREAT_MODEL.md b/THREAT_MODEL.md deleted file mode 100644 index d388228c93..0000000000 --- a/THREAT_MODEL.md +++ /dev/null @@ -1,182 +0,0 @@ - - -# Apache Fory — Threat Model (v0 draft) - -## §1 Header - -- **Project:** Apache Fory (`apache/fory`), `main`, against which this draft was written. Fory is a multi-language serialization framework (Java, C++, Python, Go, Rust, JavaScript, Kotlin, Scala, Swift, Dart, C#). -- **Date:** 2026-06-02. **Status:** draft — for Apache Fory PMC review. **Author:** ASF Security team (drafted via the Scovetta threat-model rubric), for PMC ratification. -- **Version binding:** versioned with the project; a report against Fory version *N* is triaged against the model as it stood at *N*, not at HEAD. -- **Reporting cross-reference:** findings that violate a §8 property should be reported privately per the ASF process (`security@apache.org` → `private@fory.apache.org`); findings under §3 or §9 are closed citing this document. -- **Provenance legend:** *(documented)* = stated in Fory's own docs/repo; *(maintainer)* = confirmed by a Fory PMC member through this process; *(inferred)* = reasoned from architecture/domain knowledge, not yet confirmed — every *(inferred)* claim has a matching §14 open question. -- **Draft confidence:** ~20 documented / 0 maintainer / ~26 inferred. -- **What Fory is:** Apache Fory is a high-performance, multi-language object/data serialization framework. An application uses it in-process to serialize its objects to bytes and deserialize bytes back into objects, either within one language ("native" mode) or across languages ("xlang" mode), with optional zero-copy and a row format. *(documented — README, docs/guide)* - -## §2 Scope and intended use - -- **Primary use:** an **in-process library** linked into a host application that calls `serialize()` / `deserialize()` on its own data types. *(documented — guides)* -- **It is not a network service or daemon.** It has no listening surface, no auth, no users — the embedding application owns where the bytes come from and go. *(inferred)* -- **Caller / trust level:** a single caller — the embedding application — which is **trusted** (it links the library and registers its types). The security-relevant question is not "who calls Fory" but **"where do the bytes handed to `deserialize()` come from"** — trusted producer, or attacker-controlled. *(inferred; the registration guidance is documented)* - -**Component-family table** *(in/out of this model):* - -| Family | Entry point | Notes | In model? | -| --- | --- | --- | --- | -| Object-graph serialization (native, per language) | `fory.serialize` / `deserialize` | the core; instantiates registered types from bytes | **In** *(documented)* | -| Cross-language (xlang) serialization | xlang `serialize`/`deserialize` | type mapping across languages | **In** *(documented)* | -| Row format / zero-copy | row encoders | reads fields in place from a buffer | **In** *(documented)* | -| Class/type registration + "secure mode" | `requireClassRegistration`, `register(...)` | the primary defense | **In** *(documented)* | -| Per-language implementations | `java/`, `cpp/`, `python/`, `go/`, `rust/`, `javascript/`, `kotlin/`, `scala/`, `swift/`, `dart/`, `csharp/` | each is a separate impl of the same model | **In** — but memory-safety profile differs by language (see §5/§8) *(documented: dirs exist)* | -| `examples/`, `benchmarks/`, `integration_tests/` | demo/bench/test | not production surface | **Out** *(see §3)* | - -## §3 Out of scope (explicit non-goals) - -- **The integrity / authenticity / confidentiality of the serialized bytes.** Fory deserializes what it is given; it does not authenticate, MAC, or encrypt payloads. If bytes can be tampered with in transit/at rest, that is the application's problem to solve (sign/encrypt before handing to Fory). *(inferred)* -- **Anything when the caller disables class registration on an untrusted payload source.** `requireClassRegistration(false)` is a documented, deliberately-available footgun; using it against attacker-controlled bytes is out of the model's protection (see §5a/§9). *(documented — config: "Disabling may allow unknown classes to be deserialized, potentially causing security risks")* -- **The behaviour of the application's own registered classes.** Fory instantiates and populates registered types; if a registered class has dangerous side effects in its constructors/setters/finalizers, that is the application's design, not Fory's. *(inferred)* -- **`examples/`, `benchmarks/`, `integration_tests/`** — shipped but not a production trust surface. *(inferred)* - -## §4 Trust boundaries and data flow - -- **The trust boundary is the byte buffer passed to `deserialize()`** (and the row-format buffer). Everything Fory does on the serialize side operates on the application's own in-memory objects (trusted); the deserialize side is where attacker-controlled bytes, if any, enter. *(inferred)* -- **Data flow:** untrusted bytes → format/header parse → (class id / type resolution → **registration check**) → field decode → object graph construction → returned to caller. The registration check is the gate that decides whether an arbitrary type may be instantiated. *(inferred; registration mechanism documented)* -- **Reachability precondition:** a deserialize-side finding is **in-model** only if it is reachable from the byte buffer under the **default secure configuration** (`requireClassRegistration(true)`). A finding that requires `requireClassRegistration(false)`, or that requires the *serialize* side to be fed attacker-controlled live objects, is out-of-model (§5a / trusted-input). *(inferred)* - -## §5 Assumptions about the environment - -- **In-process, no ambient I/O.** Fory does not (by design) open sockets, spawn processes, or read the network; it operates on in-memory buffers handed to it. *(inferred — high-priority confirmation; negative claim)* -- **Per-language memory model differs.** In managed runtimes (Java, Python, Go, JS, …) memory safety is the runtime's; in the **C++** (and unsafe-Rust FFI) paths, malformed input reaching the decoder is a memory-safety surface in a way it is not on the JVM. The model's "memory safety on malformed input" property is therefore language-conditional (see §8). *(inferred)* -- **Codegen / JIT:** on ordinary JVMs Fory generates serializer code at runtime (`codeGenEnabled` default true); disabled on Android / GraalVM native image. This is a performance mechanism over the application's own registered types, not a path for executing attacker bytes. *(documented — config table)* - -## §5a Build-time and configuration variants - -The security envelope is set by runtime configuration, not build flags. The load-bearing knobs *(documented — docs/guide/java/configuration.md)*: - -| Knob | Default | Effect on the model | -| --- | --- | --- | -| `requireClassRegistration` | **`true`** (secure) | When true, only registered types are deserialized — the primary defense against deserializing arbitrary/gadget classes. Disabling "may allow unknown classes to be deserialized, potentially causing security risks." | -| `maxDepth` | **`50`** | Bounds deserialization recursion depth; "can be used to refuse deserialization DDOS attack." | -| `deserializeUnknownClass` | `true` in compatible mode, else `false` | Whether data for unknown/non-existent classes is skipped/deserialized. | -| `compatible` | xlang: `true`; native: `false` | Schema forward/backward compatibility. | -| `suppressClassRegistrationWarnings` | `true` | Registration warnings are useful for security audit but suppressed by default. | - -**The default is the *secure* posture here** (registration required) — the inverse of the usual insecure-default case. The model's §8 properties hold *under the defaults*; a report that only manifests under `requireClassRegistration(false)` is `OUT-OF-MODEL: non-default-build`. Confirm this framing with the PMC (§14). - -## §6 Assumptions about inputs - -Per-entry-point trust table *(registration mechanism + defaults documented; trust framing inferred):* - -| Entry point | Input | Attacker-controllable? | Caller must enforce | -| --- | --- | --- | --- | -| `deserialize(bytes)` / `deserialize(bytes, Class)` | serialized byte buffer | **yes, if the application sources bytes from an untrusted producer** | keep `requireClassRegistration(true)`; register only safe types; integrity-check bytes upstream | -| row-format readers | buffer | **yes** (same as above) | same | -| `serialize(obj)` | a live application object | no — the app's own trusted object | n/a | -| `register(Class, …)` | type registered at setup | no — controlled by the app developer | register only types safe to instantiate from untrusted data | - -- **Size/shape/rate:** `maxDepth` (default 50) bounds nesting; whether total allocation / output size is otherwise bounded against a hostile payload is open (see §8 resource line). *(maxDepth documented; broader bound inferred)* - -## §7 Adversary model - -- **Primary adversary:** a party who controls the **serialized bytes** an application later passes to `deserialize()` (e.g. data arriving over a network the app feeds to Fory, or persisted data an attacker can tamper with). Goal: instantiate dangerous types (gadget-chain RCE), corrupt memory in the native paths, or exhaust CPU/memory. *(inferred — the canonical serialization-framework adversary)* -- **Capabilities:** can craft arbitrary/malformed byte buffers; cannot change the application's Fory configuration or its registered-type set (those are set by the trusted app at startup). *(inferred)* -- **Out of scope:** an attacker who controls the embedding application, its configuration, or the objects passed to `serialize()` — already trusted; an attacker who has set `requireClassRegistration(false)` themselves. *(inferred)* - -## §8 Security properties the project provides - -*(Registration + depth defenses documented; the guarantees framed below are for PMC confirmation.)* - -- **Registered-type-only instantiation (default).** With `requireClassRegistration(true)` (the default), deserialization instantiates only types the application registered, so attacker bytes cannot drive Fory to construct an arbitrary class. *Violation symptom:* an unregistered/unexpected type is instantiated from input under the default config. *Severity:* security-critical (this is the deserialization-RCE defense). *(documented that registration is required by default + that disabling causes risk; the unbypassability guarantee is the claim to confirm)* -- **Bounded recursion depth.** Deserialization beyond `maxDepth` (default 50) throws rather than recursing unbounded. *Violation symptom:* stack overflow / unbounded recursion from crafted nesting under the default. *Severity:* security-critical (DoS). *(documented — config table)* -- **Memory safety on malformed input — language-conditional.** In managed-runtime implementations, malformed bytes yield an exception, not memory corruption. For the **C++** implementation this is the load-bearing property to confirm (malformed-input fuzzing of the C++ decoder). *Violation symptom:* OOB read/write, crash. *Severity:* security-critical. *(inferred — confirm per language)* -- **Resource bounds beyond depth — UNSPECIFIED.** Whether a crafted payload can force large allocation / CPU blowup within the depth limit (e.g. huge declared collection sizes) is a bug or expected is open; the model needs a line (§14). *(inferred; maxDepth documented)* - -## §9 Security properties the project does *not* provide - -*(Highest-value section for integrators.)* - -- **No protection when class registration is disabled.** `requireClassRegistration(false)` deliberately allows deserializing unknown classes — using it on untrusted input re-opens the classic deserialization-gadget RCE surface. This is the caller's choice, documented as risky. *(documented — config)* -- **No payload authentication or confidentiality.** Fory does not verify that bytes came from a trusted producer or that they are unmodified; it is not a MAC, signature, or cipher. *(inferred)* **False friend:** a successful round-trip / schema-compatibility check is *not* an integrity guarantee against a malicious producer. -- **Not a sandbox for registered types.** Registering a class authorizes Fory to instantiate it from bytes; if that class's construction has side effects, Fory does not contain them. *(inferred)* -- **Cross-language type-confusion is the integrator's concern** in xlang mode — relying on the peer to send a compatible schema is a trust assumption between the two ends, not something Fory enforces against a hostile peer. *(inferred)* -- **Well-known classes left to the caller:** deserialization-gadget attacks (defended by registration, *if left on*), decompression/allocation bombs (partially bounded by `maxDepth`), and integrity attacks on the byte stream. *(inferred)* - -## §10 Downstream responsibilities (the embedding application) - -- **Keep `requireClassRegistration(true)`** whenever any deserialized bytes could be attacker-influenced (the documented production guidance). *(documented)* -- **Register only types that are safe to instantiate from untrusted data**; do not register types with dangerous construction side effects. *(inferred)* -- **Authenticate / integrity-check / decrypt** untrusted bytes *before* handing them to `deserialize()` — Fory will not. *(inferred)* -- **Tune `maxDepth`** to the application's real object depth rather than disabling it. *(inferred)* -- **In xlang mode, treat the peer's schema as a trust relationship** you control, not something Fory polices. *(inferred)* - -## §11 Known misuse patterns - -*(Draft one-liners — expand before publishing.)* - -- Setting `requireClassRegistration(false)` for convenience, then deserializing network/user data. *(documented as risky)* -- Treating Fory deserialization of untrusted bytes as safe without integrity-checking the bytes first. *(inferred)* -- Registering broad/dangerous types (or whole packages) to "make it work", widening the gadget surface. *(inferred)* -- Assuming the C++ decoder is as forgiving of malformed input as the JVM one. *(inferred)* - -## §11a Known non-findings (recurring false positives) - -*(Seed list — confirmations here are the highest-leverage scan-suppression input.)* - -- "Fory can deserialize arbitrary classes → RCE" — **only** with `requireClassRegistration(false)`; under the default (`true`) it cannot. A report that assumes registration is off is `OUT-OF-MODEL: non-default-build` unless the PMC says otherwise. *(documented)* -- "No signature/MAC/encryption on the serialized format" — by-design; integrity/confidentiality is the caller's (§9/§10). *(inferred)* -- "Unbounded recursion on nested input" — bounded by `maxDepth` (default 50). *(documented)* -- "Registered class X does something dangerous when constructed" — the application's registration choice (§3/§10), not a Fory bug. *(inferred)* -- "Reflection / dynamic codegen used at runtime" — `codeGenEnabled` operates over the app's own registered types, not attacker bytes (§5). *(documented config; framing inferred)* - -## §12 Conditions that would change this model - -- A change to the **default** of `requireClassRegistration` or `maxDepth`. *(documented knobs)* -- A new deserialization entry point or a new language implementation with a different memory-safety profile. *(inferred)* -- Fory gaining any I/O / network surface (it would stop being a pure in-process library). *(inferred)* -- A report that cannot be routed to a single §13 disposition → revise the model. - -## §13 Triage dispositions - -| Disposition | Meaning | Licensed by | -| --- | --- | --- | -| `VALID` | Violates a §8 property under the **default** config via attacker-controlled bytes (e.g. unregistered-type instantiation with registration on; unbounded recursion within maxDepth; C++ memory corruption on malformed input). | §8, §6, §7 | -| `VALID-HARDENING` | No §8 property broken, but a §11 misuse is easy enough to harden (e.g. a safer default, a louder warning). | §11 | -| `OUT-OF-MODEL: trusted-input` | Requires attacker control of the serialize-side objects, the registered-type set, or the Fory config. | §6, §7 | -| `OUT-OF-MODEL: non-default-build` | Only manifests with `requireClassRegistration(false)` or another discouraged §5a setting. | §5a | -| `OUT-OF-MODEL: unsupported-component` | Lands in `examples/`, `benchmarks/`, `integration_tests/`. | §3 | -| `BY-DESIGN: property-disclaimed` | Concerns a §9-disclaimed property (no payload auth/encryption, not a sandbox for registered types, xlang peer trust). | §9 | -| `KNOWN-NON-FINDING` | Matches a §11a entry. | §11a | -| `MODEL-GAP` | Cannot be cleanly routed — triggers a §12 revision. | §12 | - -## §14 Open questions for the maintainers - -**Wave 1 — scope & the registration framing:** - -1. Confirm Fory is modeled as an **in-process library** with no ambient I/O (no sockets/processes/network) — the negative-side-effects inventory in §5. Proposed: yes. → §2/§5. -2. **The core ruling:** with `requireClassRegistration(true)` (default), is "only registered types are instantiated from untrusted bytes" a property Fory **commits to** (so a bypass is `VALID`/security-critical)? And is a finding that requires `requireClassRegistration(false)` correctly `OUT-OF-MODEL: non-default-build`? Proposed: yes to both. → §8/§5a/§13. -3. Confirm `examples/`/`benchmarks/`/`integration_tests/` are out of scope. → §3. - -**Wave 2 — language profiles & inputs:** - -4. **Per-language memory safety:** for which implementations does Fory claim "malformed input → clean error, not memory corruption"? Is the **C++** decoder the primary memory-safety surface to fuzz, and does it carry the same guarantee? → §5/§8. -5. Beyond `maxDepth`, are there bounds on total allocation / declared collection sizes / output size against a hostile payload, or is that explicitly the caller's concern? Where is the resource/DoS line? → §8/§11a. -6. In **xlang** mode, what does Fory assume about the peer — is a hostile/malformed peer schema in scope, or is the peer a trusted endpoint? Proposed: peer trusted; type-confusion is the integrator's concern. → §7/§9. - -**Wave 3 — disclaimers & non-findings:** - -7. Confirm Fory disclaims payload integrity/authenticity/confidentiality (no MAC/sig/encryption) and is not a sandbox for registered types' own logic. → §9. -8. Any other recurring scanner/fuzzer false positives the PMC already knows about, to seed §11a (e.g. reflection/Unsafe usage, codegen)? → §11a. -9. **Meta:** Fory has no in-repo `SECURITY.md` and an `AGENTS.md` that is a developer/agent guide. This engagement adds `SECURITY.md` + `THREAT_MODEL.md` and wires `AGENTS.md → SECURITY.md → THREAT_MODEL.md`. Confirm the model should live in-repo (as proposed) vs. on the website, and who owns revisions. The existing config-guide "Security" section becomes a pointer to this model. → §1. diff --git a/docs/security/deserialization.md b/docs/security/deserialization.md index 390b1855b4..8ad54789e7 100644 --- a/docs/security/deserialization.md +++ b/docs/security/deserialization.md @@ -1,6 +1,6 @@ --- title: Deserialization Security Model -sidebar_position: 2 +sidebar_position: 3 --- This document defines the security model for Apache Fory deserialization. It is @@ -52,6 +52,41 @@ Fory security boundaries do not include: shape, unless rejecting other shapes is an explicit owner policy or protects one of the boundaries above. +## Type And Class Policy + +Type, class, function, method, registration, and deserialization policies are +security boundaries when they are intended to restrict what untrusted bytes may +materialize. + +For untrusted data, a bypass is security-relevant when encoded bytes can +materialize a type, function, method, class, or dynamic object that the active +Fory policy should reject. This includes bypasses of class or type +registration, allow-list checkers, strict-mode checks, or language-specific +deserialization policies. + +Disabling registration or dynamic-type checks for trusted data is a caller +configuration choice. That choice only removes the arbitrary-type materialization +claim provided by that policy; it does not remove Fory's runtime-safety, +resource, cleanup, retained-state, or no-progress-loop requirements for +untrusted deserialization paths. + +Fory is not a sandbox for application-owned types. If a registered type or +serializer is allowed by the active policy, the application owns whether that +type's construction, hooks, setters, finalizers, or other logic is safe for the +application's trust boundary. + +## Depth And Progress + +Deserialization paths that recurse through objects, metadata, containers, or +references should enforce the runtime's configured depth limit before crafted +nesting can exhaust the call stack or bypass cleanup. A malformed input that +exceeds the configured depth should fail the root operation instead of +continuing unbounded recursion. + +Loops that consume encoded data should guarantee byte progress, logical +progress, or a terminal error. Inputs that can keep a reader in a no-progress +loop are security-relevant even when they do not allocate memory. + ## Security Invariants Deserialization code must prevent the following outcomes for untrusted input: diff --git a/docs/security/index.md b/docs/security/index.md index c75b583538..66f8cc7e59 100644 --- a/docs/security/index.md +++ b/docs/security/index.md @@ -3,9 +3,9 @@ title: Security sidebar_position: 1 --- -This directory documents Apache Fory security models and security invariants. -It is not a vulnerability disclosure area and does not list CVE details, -exploit samples, issue timelines, or implementation history. +This directory documents Apache Fory security models and security invariants. It +is not a vulnerability disclosure area and does not list CVE details, exploit +samples, issue timelines, or implementation history. Security model documents describe how Fory should classify and prevent security risks while preserving the performance characteristics expected from Fory @@ -13,4 +13,10 @@ serialization runtimes. ## Models -- [Deserialization Security Model](deserialization.md) +- [Threat Model](threat-model.md): project-level trust boundaries, non-goals, + and downstream responsibilities. +- [Deserialization Security Model](deserialization.md): concrete rules for + classifying and preventing untrusted deserialization risks. + +For vulnerability reporting, see the repository +[security policy](../../SECURITY.md). diff --git a/docs/security/threat-model.md b/docs/security/threat-model.md new file mode 100644 index 0000000000..461c516425 --- /dev/null +++ b/docs/security/threat-model.md @@ -0,0 +1,75 @@ +--- +title: Threat Model +sidebar_position: 2 +--- + +This document describes Apache Fory's project-level security boundaries and +non-goals. It is the high-level entry point for Fory security models; concrete +untrusted deserialization classification rules live in the +[deserialization security model](deserialization.md). + +Fory is an in-process serialization library. Applications link Fory into their +own process, configure serializers and type policies, and call Fory APIs to +serialize application-owned objects or deserialize encoded Fory data. Fory does +not provide a network service, daemon, authentication system, or transport +protocol. + +## Trust Boundaries + +Fory's primary security boundary is encoded bytes or streams passed to +deserialization APIs from untrusted or partially trusted sources. The embedding +application owns where those bytes come from and which Fory configuration, +registered types, schemas, and policies are used to read them. + +Fory security boundaries include: + +- Runtime safety, including avoiding crashes, panics, undefined behavior, and + out-of-bounds memory access. +- Resource ownership, including memory, CPU progress, stream buffers, native + allocations, callbacks, and retained read-side state. +- Explicit Fory policy checks, such as class, type, function, method, + registration, or deserialization policies that restrict what may be + materialized. +- Cleanup boundaries, where state created during a failed root operation must + not leak into later operations. + +The [deserialization security model](deserialization.md) defines how to +classify these boundaries for untrusted deserialization paths. + +## Non-Goals + +Fory does not provide: + +- Encoded-data authenticity, integrity, confidentiality, signing, MACs, or + encryption. +- Transport security or protection for bytes while they are stored or moved + outside Fory. +- Application-level authorization or validation for the business meaning of a + successfully deserialized value. +- A sandbox for user-registered classes, functions, constructors, setters, + finalizers, or other application-owned logic. + +Applications that receive Fory data from untrusted sources should authenticate +or integrity-check those bytes before passing them to Fory when authenticity or +tamper resistance matters. + +## Downstream Responsibilities + +Applications are responsible for: + +- Choosing whether a byte source is trusted enough for the configured + deserialization mode. +- Keeping class or type registration enabled for untrusted data unless another + explicit Fory policy owns the accepted type surface. +- Registering only types and serializers that are safe for the application's + trust boundary. +- Configuring depth and resource limits for the largest data shape the + application intends to accept. +- Treating cross-language peers and schemas as part of the application's trust + relationship. + +Disabling registration or using dynamic deserialization on trusted data is a +configuration choice. For untrusted data, bypassing an explicit Fory policy, +crashing, leaking resources, retaining attacker-controlled state, or allocating +disproportionately remains security-relevant as described in the +[deserialization security model](deserialization.md). From f20e99cbe7543baf59b987cf1ba43bc0bcefac72 Mon Sep 17 00:00:00 2001 From: chaokunyang Date: Wed, 17 Jun 2026 14:58:06 +0800 Subject: [PATCH 4/7] docs: clarify security model guidance --- AGENTS.md | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 04cc5ead39..f7d31fe162 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -9,7 +9,9 @@ This is the entry point for AI guidance in Apache Fory. Read this file first, th - `.agents/docs-and-formatting.md`: documentation, specification, and markdown rules. - `.agents/ci-and-pr.md`: CI triage, PR expectations, and commit conventions. - `.agents/testing/integration-tests.md`: `integration_tests/` prerequisites, regeneration rules, and commands. -- `docs/security/index.md`: security model index and threat model routing. +- `docs/security/index.md`: security model index. +- `docs/security/threat-model.md`: project-level trust boundaries, non-goals, + and downstream responsibilities. - `docs/security/deserialization.md`: security boundaries for untrusted deserialization classification. - `.agents/languages/java.md` - `.agents/languages/csharp.md` @@ -181,7 +183,8 @@ This is the entry point for AI guidance in Apache Fory. Read this file first, th ## Security -Security models start at `docs/security/index.md`. For untrusted -deserialization, read `docs/security/deserialization.md` before reporting or -changing allocation, stream filling, skip, reference, metadata, or policy -validation behavior. +Security models start at `docs/security/index.md`. Read +`docs/security/threat-model.md` for project-level trust boundaries, non-goals, +and downstream responsibilities. For untrusted deserialization, read +`docs/security/deserialization.md` before reporting or changing allocation, +stream filling, skip, reference, metadata, or policy validation behavior. From 2d5580571b08fe364b269145c21930048a7a935b Mon Sep 17 00:00:00 2001 From: chaokunyang Date: Wed, 17 Jun 2026 15:18:56 +0800 Subject: [PATCH 5/7] docs: clarify security transport scope --- docs/README.md | 3 ++- docs/security/threat-model.md | 11 ++++++++--- 2 files changed, 10 insertions(+), 4 deletions(-) diff --git a/docs/README.md b/docs/README.md index 27500ed942..356a3ab844 100644 --- a/docs/README.md +++ b/docs/README.md @@ -10,7 +10,8 @@ [Kotlin](guide/kotlin/index.md) guides. - For row format, see the [row format spec](specification/row_format_spec.md). - For using Apache Fory™ with GraalVM native image, see [graalvm support](guide/java/graalvm-support.md) doc. -- For deserialization security boundaries, see the [security model](security/deserialization.md). +- For security models and deserialization boundaries, see the + [security docs](security/). ## Fory IDL Schema diff --git a/docs/security/threat-model.md b/docs/security/threat-model.md index 461c516425..f2fc8de662 100644 --- a/docs/security/threat-model.md +++ b/docs/security/threat-model.md @@ -11,8 +11,13 @@ untrusted deserialization classification rules live in the Fory is an in-process serialization library. Applications link Fory into their own process, configure serializers and type policies, and call Fory APIs to serialize application-owned objects or deserialize encoded Fory data. Fory does -not provide a network service, daemon, authentication system, or transport -protocol. +not provide a standalone network service, daemon, authentication system, or +transport protocol. + +Fory can generate service companions for application-provided gRPC runtimes. +Those companions provide Fory serialization for request and response objects; +the application and gRPC stack still own listeners, channels, credentials, +authentication, authorization, deadlines, retries, and transport lifecycle. ## Trust Boundaries @@ -43,7 +48,7 @@ Fory does not provide: - Encoded-data authenticity, integrity, confidentiality, signing, MACs, or encryption. - Transport security or protection for bytes while they are stored or moved - outside Fory. + outside Fory, including transport security for generated service companions. - Application-level authorization or validation for the business meaning of a successfully deserialized value. - A sandbox for user-registered classes, functions, constructors, setters, From 01617040ee7ff253e43e43df2ca98629463856f7 Mon Sep 17 00:00:00 2001 From: chaokunyang Date: Wed, 17 Jun 2026 15:28:12 +0800 Subject: [PATCH 6/7] docs: link security index directly --- SECURITY.md | 2 +- docs/README.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/SECURITY.md b/SECURITY.md index 2f663349a8..321a1673ae 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -25,7 +25,7 @@ GitHub issues or pull requests for security reports. ## Security Models Apache Fory security models are documented under -[docs/security](docs/security/). Start with the +[docs/security](docs/security/index.md). Start with the [project threat model](docs/security/threat-model.md); for detailed untrusted deserialization classification rules, see the [deserialization security model](docs/security/deserialization.md). diff --git a/docs/README.md b/docs/README.md index 356a3ab844..15d0393362 100644 --- a/docs/README.md +++ b/docs/README.md @@ -11,7 +11,7 @@ - For row format, see the [row format spec](specification/row_format_spec.md). - For using Apache Fory™ with GraalVM native image, see [graalvm support](guide/java/graalvm-support.md) doc. - For security models and deserialization boundaries, see the - [security docs](security/). + [security docs](security/index.md). ## Fory IDL Schema From 4daa8ba736f9096e752691c96dd37b60b2cc303e Mon Sep 17 00:00:00 2001 From: chaokunyang Date: Wed, 17 Jun 2026 15:52:23 +0800 Subject: [PATCH 7/7] docs: clarify threat model gates --- docs/security/threat-model.md | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/docs/security/threat-model.md b/docs/security/threat-model.md index f2fc8de662..5fea135056 100644 --- a/docs/security/threat-model.md +++ b/docs/security/threat-model.md @@ -26,6 +26,13 @@ deserialization APIs from untrusted or partially trusted sources. The embedding application owns where those bytes come from and which Fory configuration, registered types, schemas, and policies are used to read them. +The adversary model for untrusted deserialization is a sender that can craft +encoded bytes or stream behavior presented to a Fory read API. It does not assume +the sender can change the embedding application's Fory configuration, registered +type set, `TypeChecker` or equivalent allow-list policy, schema definitions, +classloader, or other active policy objects unless the application itself exposes +those controls. + Fory security boundaries include: - Runtime safety, including avoiding crashes, panics, undefined behavior, and @@ -38,6 +45,14 @@ Fory security boundaries include: - Cleanup boundaries, where state created during a failed root operation must not leak into later operations. +Runtime serializer code generation and JIT compilation are not paths for +executing encoded input. They operate on types and schemas after the active +registration check, `TypeChecker`, schema check, or policy check has accepted the +type surface. When class registration is disabled, `TypeChecker` or an +equivalent allow-list policy is the relevant gate. Generated serializer code is +derived from checked type descriptors rather than from attacker-controlled byte +contents. + The [deserialization security model](deserialization.md) defines how to classify these boundaries for untrusted deserialization paths.