Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 5 additions & 6 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,11 @@ concurrency:
cancel-in-progress: ${{ github.event_name == 'pull_request' }}

env:
# Pin the sibling C++ checkout to a specific commit. cpp@v0.70.0
# predates the __int128 → checked_arith refactor (MSVC), the
# protobuf-API-skew shim, and the MSVC source-charset fix; 9af2ec0
# is the first commit with all of those. Bump to a v0.70.x tag
# once cpp cuts one that includes them.
PROTOWIRE_CPP_REF: 9af2ec04918a417933848de1577cd61f83a710b0
# Pin the sibling C++ checkout to a specific tag. v0.75.0 carries the
# PXF v0.72-series feature set (@<name> / @entry / @table directive
# grammar, schema validator, Result accessors, TableReader streaming)
# the Python port wraps. Bump in lockstep with cpp release cuts.
PROTOWIRE_CPP_REF: v0.75.0

jobs:
# ---------------------------------------------------------------------
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/codeql.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,8 @@ permissions:
security-events: write

env:
# See ci.yml for why this is a SHA, not v0.70.0.
PROTOWIRE_CPP_REF: 9af2ec04918a417933848de1577cd61f83a710b0
# See ci.yml for the rationale on this pin.
PROTOWIRE_CPP_REF: v0.75.0

jobs:
analyze:
Expand Down
12 changes: 5 additions & 7 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,13 +24,11 @@ env:
# frozen FFI surface, so it must be an immutable ref — never a
# branch.
#
# Using a SHA (not the v0.70.0 tag) because cpp@v0.70.0 predates
# the f1d3eb0 __int128 → checked_arith refactor needed for MSVC,
# plus the protobuf-API-skew shim and MSVC source-charset fixes.
# 9af2ec0 (cpp main, ci: MSVC source-charset + skip pxf_escapes)
# is the first commit with all of those. Bump to v0.70.x once cpp
# cuts a tag that includes them.
PROTOWIRE_CPP_REF: 9af2ec04918a417933848de1577cd61f83a710b0
# Pinned to a tagged C++ release. v0.75.0 ships the PXF v0.72-series
# feature set (@<name> / @entry / @table grammar, schema validator,
# Result accessors, TableReader streaming) that this Python port
# wraps. Bump in lockstep with cpp release cuts.
PROTOWIRE_CPP_REF: v0.75.0

jobs:
# ---------------------------------------------------------------------
Expand Down
33 changes: 33 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,39 @@ format changes.

## [Unreleased]

### Changed

- **CI pin to protowire-cpp v0.75.0.** The cpp sibling now ships the
PXF v0.72-series feature set (directive grammar, schema validator,
Result accessors, TableReader streaming). The pin moves from the
pre-v0.72 commit `9af2ec0` to the `v0.75.0` tag so the Python
wrapper exposes the new surface.

### Added

- **`pxf.Result.directives` / `pxf.Result.tables`** — the document-root
directives the decoder saw at `unmarshal_full` time, exposed as
immutable dataclasses:
- `pxf.Directive(name, prefixes, type, body, has_body, line, column)`
for generic `@<name> *(prefix) [{ ... }]` blocks. `body` is the
raw bytes between `{` and `}` (verbatim), suitable for handing to
a follow-up `pxf.unmarshal` against the consumer's message type.
`type` keeps the v0.72.0 single-prefix back-compat shape.
- `pxf.TableDirective(type, columns, rows)` for `@table` directives,
with cells modeled as `None` (absent) or a `(kind, value)` 2-tuple
where kind ∈ {`"null"`, `"string"`, `"int"`, `"float"`, `"bool"`,
`"bytes"`, `"ident"`, `"timestamp"`, `"duration"`} — faithful to
the three-state cell grammar (absent / present-but-null /
present-with-value, draft §3.4.4).
- **`pxf.validate_descriptor(msg)` + `pxf.Violation`** — schema
reserved-name check (draft §3.13). Returns the list of fields,
oneofs, and enum values whose names case-sensitively match a PXF
value keyword (`null` / `true` / `false`). Sorted by element FQN.
- **`skip_validate` keyword** on `pxf.unmarshal` and
`pxf.unmarshal_full` (and the `_bytes` variants) — opt-out of the
per-call schema validator when the caller has already validated the
descriptor at registry-load time.

## [0.70.0]

Initial public release. The version number aligns this port with the rest
Expand Down
126 changes: 119 additions & 7 deletions src/_protowire/module.cc
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,12 @@

#include <cstdint>
#include <memory>
#include <optional>
#include <span>
#include <string>
#include <string_view>
#include <utility>
#include <variant>
#include <vector>

#include <google/protobuf/descriptor.h>
Expand Down Expand Up @@ -77,16 +79,67 @@ const pbuf::Descriptor* FindDescriptor(const SchemaBundle& s,

// --- pxf bindings ---------------------------------------------------------

// CellToPyTuple converts a single AST cell value (or std::nullopt for an
// absent cell) into the FFI shape consumed by pxf.py — `None` for absent,
// `(kind, value)` otherwise. Used by PxfUnmarshalFull for @table rows.
//
// kind values mirror the AST variant tags:
// "null" → nb::none()
// "string" → str (already-unescaped UTF-8)
// "int" → str (raw integer text — Python wrapper decides parse)
// "float" → str (raw float text)
// "bool" → bool
// "bytes" → bytes
// "ident" → str
// "timestamp" → str (raw RFC3339)
// "duration" → str (raw duration)
nb::object CellToPyTuple(const std::optional<protowire::pxf::ValuePtr>& cell) {
if (!cell.has_value()) return nb::none();
using namespace protowire::pxf;
return std::visit(
[](const auto& p) -> nb::object {
using T = std::decay_t<decltype(*p)>;
if constexpr (std::is_same_v<T, NullVal>) {
return nb::make_tuple(std::string("null"), nb::none());
} else if constexpr (std::is_same_v<T, StringVal>) {
return nb::make_tuple(std::string("string"), p->value);
} else if constexpr (std::is_same_v<T, IntVal>) {
return nb::make_tuple(std::string("int"), p->raw);
} else if constexpr (std::is_same_v<T, FloatVal>) {
return nb::make_tuple(std::string("float"), p->raw);
} else if constexpr (std::is_same_v<T, BoolVal>) {
return nb::make_tuple(std::string("bool"), p->value);
} else if constexpr (std::is_same_v<T, BytesVal>) {
return nb::make_tuple(
std::string("bytes"),
nb::bytes(reinterpret_cast<const char*>(p->value.data()), p->value.size()));
} else if constexpr (std::is_same_v<T, IdentVal>) {
return nb::make_tuple(std::string("ident"), p->name);
} else if constexpr (std::is_same_v<T, TimestampVal>) {
return nb::make_tuple(std::string("timestamp"), p->raw);
} else if constexpr (std::is_same_v<T, DurationVal>) {
return nb::make_tuple(std::string("duration"), p->raw);
} else {
// List / Block are rejected at @table cell-parse time, so this
// branch is unreachable for cells. Surface as a clean error.
return nb::make_tuple(std::string("unknown"), nb::none());
}
},
*cell);
}

// PXF text -> binary proto bytes.
nb::bytes PxfUnmarshal(nb::bytes text, nb::bytes fds_bytes,
const std::string& full_name, bool discard_unknown) {
const std::string& full_name, bool discard_unknown,
bool skip_validate) {
auto schema = BuildSchema(std::string_view(fds_bytes.c_str(), fds_bytes.size()));
const auto* desc = FindDescriptor(schema, full_name);
std::unique_ptr<pbuf::Message> msg(
schema.factory->GetPrototype(desc)->New());

protowire::pxf::UnmarshalOptions opts;
opts.discard_unknown = discard_unknown;
opts.skip_validate = skip_validate;
auto st = protowire::pxf::Unmarshal(
std::string_view(text.c_str(), text.size()), msg.get(), opts);
if (!st.ok()) {
Expand All @@ -99,17 +152,28 @@ nb::bytes PxfUnmarshal(nb::bytes text, nb::bytes fds_bytes,
return nb::bytes(out.data(), out.size());
}

// PXF text -> (binary proto bytes, set_paths, null_paths).
std::tuple<nb::bytes, std::vector<std::string>, std::vector<std::string>>
// Directive FFI shape: (name, prefixes, type, body, has_body, line, column).
using PyDirective = std::tuple<std::string, std::vector<std::string>, std::string,
nb::bytes, bool, int, int>;
// TableDirective FFI shape: (type, columns, rows) where rows is a list of
// lists of cells (each cell None or (kind, value); see CellToPyTuple).
using PyTableDirective = std::tuple<std::string, std::vector<std::string>,
std::vector<std::vector<nb::object>>>;

// PXF text -> (binary proto bytes, set_paths, null_paths, directives, tables).
std::tuple<nb::bytes, std::vector<std::string>, std::vector<std::string>,
std::vector<PyDirective>, std::vector<PyTableDirective>>
PxfUnmarshalFull(nb::bytes text, nb::bytes fds_bytes,
const std::string& full_name, bool discard_unknown) {
const std::string& full_name, bool discard_unknown,
bool skip_validate) {
auto schema = BuildSchema(std::string_view(fds_bytes.c_str(), fds_bytes.size()));
const auto* desc = FindDescriptor(schema, full_name);
std::unique_ptr<pbuf::Message> msg(
schema.factory->GetPrototype(desc)->New());

protowire::pxf::UnmarshalOptions opts;
opts.discard_unknown = discard_unknown;
opts.skip_validate = skip_validate;
auto r = protowire::pxf::UnmarshalFull(
std::string_view(text.c_str(), text.size()), msg.get(), opts);
if (!r.ok()) {
Expand All @@ -119,9 +183,56 @@ PxfUnmarshalFull(nb::bytes text, nb::bytes fds_bytes,
if (!msg->SerializeToString(&out)) {
throw nb::value_error("pxf.unmarshal_full: proto serialization failed");
}
// Marshal directives.
std::vector<PyDirective> py_dirs;
py_dirs.reserve(r->Directives().size());
for (const auto& d : r->Directives()) {
py_dirs.emplace_back(
d.name, d.prefixes, d.type,
nb::bytes(d.body.data(), d.body.size()),
d.has_body, d.pos.line, d.pos.column);
}
// Marshal tables.
std::vector<PyTableDirective> py_tables;
py_tables.reserve(r->Tables().size());
for (const auto& t : r->Tables()) {
std::vector<std::vector<nb::object>> py_rows;
py_rows.reserve(t.rows.size());
for (const auto& row : t.rows) {
std::vector<nb::object> py_cells;
py_cells.reserve(row.cells.size());
for (const auto& cell : row.cells) py_cells.push_back(CellToPyTuple(cell));
py_rows.push_back(std::move(py_cells));
}
py_tables.emplace_back(t.type, t.columns, std::move(py_rows));
}
return {nb::bytes(out.data(), out.size()),
r->SetFields(),
r->NullFields()};
r->NullFields(),
std::move(py_dirs),
std::move(py_tables)};
}

// PXF schema reserved-name check (draft §3.13). Returns a list of
// (kind, element, name, file) tuples. Empty list ⇒ conformant schema.
// kind values: "field" / "oneof" / "enum_value".
std::vector<std::tuple<std::string, std::string, std::string, std::string>>
PxfValidateDescriptor(nb::bytes fds_bytes, const std::string& full_name) {
auto schema = BuildSchema(std::string_view(fds_bytes.c_str(), fds_bytes.size()));
const auto* desc = FindDescriptor(schema, full_name);
auto vs = protowire::pxf::ValidateDescriptor(desc);
std::vector<std::tuple<std::string, std::string, std::string, std::string>> out;
out.reserve(vs.size());
for (const auto& v : vs) {
std::string kind;
switch (v.kind) {
case protowire::pxf::ViolationKind::kField: kind = "field"; break;
case protowire::pxf::ViolationKind::kOneof: kind = "oneof"; break;
case protowire::pxf::ViolationKind::kEnumValue: kind = "enum_value"; break;
}
out.emplace_back(std::move(kind), v.element, v.name, v.file);
}
return out;
}

// Binary proto bytes -> PXF text.
Expand Down Expand Up @@ -301,10 +412,11 @@ NB_MODULE(_protowire, m) {
m.doc() = "protowire native extension (nanobind shim around protowire-cpp)";

m.def("pxf_unmarshal", &PxfUnmarshal, "text"_a, "fds"_a, "full_name"_a,
"discard_unknown"_a = false);
"discard_unknown"_a = false, "skip_validate"_a = false);
m.def("pxf_unmarshal_full", &PxfUnmarshalFull, "text"_a, "fds"_a,
"full_name"_a, "discard_unknown"_a = false);
"full_name"_a, "discard_unknown"_a = false, "skip_validate"_a = false);
m.def("pxf_marshal", &PxfMarshal, "msg_bytes"_a, "fds"_a, "full_name"_a);
m.def("pxf_validate_descriptor", &PxfValidateDescriptor, "fds"_a, "full_name"_a);

nb::class_<SbeCodec>(m, "SbeCodec")
.def_static("create", &SbeCodec::Create, "fds"_a, "file_names"_a)
Expand Down
Loading
Loading