pxf: TableReader streaming + bind_row per-row binding#7
Merged
Conversation
Second PR of the Python v0.72-v0.75 catch-up. Wraps protowire-cpp's TableReader (PR cpp#13) so consumers can stream @table rows from an in-memory buffer instead of materializing the whole row sequence up front. API surface in src/protowire/pxf.py: pxf.TableReader.from_bytes(data) → TableReader .type — row message type, e.g. "trades.v1.Trade" .columns — tuple of column field names .directives — side-channel @<name> directives before the header .done — True once the row sequence is exhausted .next_or_none() — next row, or None at EOF __iter__/__next__ — standard Python iterator protocol .tail() — bytes for chaining a second TableReader .scan(msg) — Next + bind_row in one call pxf.bind_row(msg, columns, row, *, skip_validate=True) → None Free function for callers iterating Result.tables[i].rows from the materializing path. Strategy: format-and-reparse — render cells as a synthetic PXF body and run through unmarshal. This matches protowire-cpp's BindRow and reuses every branch of the existing decoder (WKT, wrappers, enums, oneof, pxf.required / pxf.default). FFI (src/_protowire/module.cc): - PyTableReader class wraps protowire::pxf::TableReader. The wrapped istringstream is held alongside the reader so its lifetime is bound to the Python object (the cpp TableReader takes a non-owning std::istream*). - Cells marshal through the same CellToPyTuple helper used by PxfUnmarshalFull's Result.tables. - __iter__ returns self; __next__ raises StopIteration at EOF. Tests (22 new in tests/test_pxf_table_reader.py, 84 total): - Header parsing: happy path, str input, no-@table error, empty input, leading directives, 64 KiB header cap - Iteration: ordered rows, zero-rows-stops, next_or_none EOF, three-state cells, sticky arity error, parens-in-strings, comments between rows - Tail chaining to a second @table - bind_row + scan: column-by-name binding, absent-leaves-default, null-clears-wrapper (StringValue), bytes cell round-trip, mismatched-columns error, unknown-column error, string escape
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Second of three PRs in the Python v0.72–v0.75 catch-up. Wraps `protowire-cpp`'s `TableReader` (cpp #13) so consumers can stream `@table` rows from an in-memory buffer instead of materializing the whole row sequence up front.
Python surface (in `src/protowire/pxf.py`):
```python
reader = pxf.TableReader.from_bytes(data)
print(reader.type, reader.columns)
for row in reader:
msg = TradeMsg()
pxf.bind_row(msg, reader.columns, row)
handle(msg)
Optional: chain a second @table from the same input
tr2 = pxf.TableReader.from_bytes(reader.tail())
```
`pxf.bind_row(msg, columns, row, *, skip_validate=True)` is exported as a free function for callers iterating `Result.tables[i].rows` from the materializing path. Strategy is format-and-reparse — matches the cpp port's `BindRow`. Cells render as a synthetic PXF body and run through `unmarshal`, reusing every branch of the existing decoder (WKT timestamps / durations, wrapper-nullability, enum-by-name, `pxf.required` / `pxf.default`, oneof).
FFI (`src/_protowire/module.cc`):
Test plan