Add vertex and edge composite types with direct field access optimization#2303
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request introduces vertex and edge as PostgreSQL composite types to optimize graph query performance. The changes enable direct field access for properties and accessor functions instead of rebuilding agtype structures.
Changes:
- Introduces vertex (id, label, properties) and edge (id, label, start_id, end_id, properties) composite types
- Optimizes property access to use direct field selection instead of agtype rebuild
- Changes _label_name to return agtype instead of cstring for type consistency
- Adds implicit casts from vertex/edge to agtype and json
- Fixes MERGE clause type mismatches with volatile wrapper
Reviewed changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| src/include/utils/agtype.h | Added OID accessors for VERTEXOID and EDGEOID composite types |
| src/include/parser/cypher_expr.h | Added helper functions for entity type checking and field extraction |
| src/include/parser/cypher_clause.h | Added field access helper functions for composite types |
| src/include/catalog/ag_label.h | Added get_label_name function signature |
| src/backend/utils/adt/agtype.c | Implemented composite type OID caching, vertex/edge to agtype casts, label parameter type change |
| src/backend/parser/cypher_expr.c | Implemented accessor function optimization and entity type coercion |
| src/backend/parser/cypher_clause.c | Updated entity expression builders to use RowExpr, added field selection helpers |
| src/backend/executor/*.c | Updated executors to use agtype labels instead of cstrings |
| src/backend/catalog/ag_label.c | Changed _label_name to return agtype, added cache-based get_label_name |
| sql/agtype_graphid.sql | Defined vertex/edge composite types and cast functions |
| sql/age_scalar.sql | Added _get_vertex_by_graphid helper function |
| sql/age_main.sql | Removed duplicate _label_name definition |
| regress/sql/*.sql | Added comprehensive tests for accessor optimizations and composite types |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
4dab54d to
9f4904e
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 23 out of 23 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
src/backend/utils/adt/agtype.c:2394
- Potential memory leak: pnstrdup allocates memory for label at line 2377, but if an error occurs after this point (e.g., at line 2391 when properties validation fails), the allocated memory for label is not freed before the error is thrown. Consider using a PG_TRY/PG_CATCH block or allocating in the current memory context that will be cleaned up on error.
label = pnstrdup(label_value->val.string.val, label_value->val.string.len);
if (fcinfo->args[2].isnull)
{
bstate = init_agtype_build_state(0, AGT_FOBJECT);
properties = build_agtype(bstate);
pfree_agtype_build_state(bstate);
}
else
{
properties = AG_GET_ARG_AGTYPE_P(2);
if (!AGT_ROOT_IS_OBJECT(properties))
{
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("_agtype_build_vertex() properties argument must be an object")));
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@MuhammadTahaNaveed Cursory thoughts so far -
|
|
@MuhammadTahaNaveed Ambiguous operator added - Current master branch PR 2303 Edge too |
9f4904e to
9d6a08b
Compare
Updated
The changed functions (_label_name, _agtype_build_vertex, _agtype_build_edge) are intended for internal use only. They are invoked by the parser/executor and are not meant to be called directly by users. The upgrade script in age--1.6.0--y.y.y.sql manages the transition by dropping the old function signatures and creating the new ones. No data migration is required, as the underlying table structure remains unchanged. From a user perspective, existing queries that return agtype continue to work without any changes. The composite types are introduced to improve internal query transformation and serve as an additional output option, rather than a breaking change.
The composite types don’t increase storage, as they are only a query-time representation. The underlying tables continue to store the same data as before. The agtype representation is still required for paths (which are agtype arrays of vertices/edges), for storing entities in agtype containers, and for compatibility with existing agtype operators. When agtype is needed, the implicit casts(backed by functions vertex_to_agtype and edge_to_agtype) handle the conversion.
Good catch. Fixed it. |
6c4402a to
001a349
Compare
Review: Logic soundness + regression-test validationVerdict: logic is sound, and all regression-test changes are legitimate (no masked correctness regressions). This PR replaces the Logic soundnessThe one structurally risky element — the
This is empirically confirmed by the new EXPLAIN tests: Other verified items:
Two non-blocking findings:
Regression-test classification
Reorder vetting: every reordering-only change preserves the complete row set (all ids present before and after) and value-bearing rows retain their values. The reordering is a benign consequence of vertices/edges now being composite Error-message modifications all keep the same outcome (the query still errors with the same semantics), e.g. SummaryThe core C logic is correct — including the bug-prone |
04dc881 to
708291b
Compare
Addressed both.
|
|
@MuhammadTahaNaveed One last Copilot review for sanity. Unless it has anything worth fixing, I will merge it. |
…tion - Introduce vertex and edge as pg composite types vertex: (id, label, properties) edge: (id, label, start_id, end_id, properties) - end_id precedes start_id intentionally, it matches the agtype edge's canonical key order (keys sorted by length, end_id=6 < start_id=8), keeping the composite positionally consistent with the agtype representation. - Property access (a.name) now directly uses a.properties for agtype_access_operator instead of rebuilding via _agtype_build_vertex/edge - Optimize accessor functions (id, properties, label, type, start_id, end_id) to use direct FieldSelect on composite types instead of agtype functions - Add casts: vertex/edge to agtype, vertex/edge to json/jsonb - Add eq and not eq ops for vertex/edge composite types. - Fix label_name specific routine to use cache instead of ag_label scan - Write/update clauses have executors strictly tied to agtype, due to which the variables after any write/update clause are carried forward as agtype. - Allows users to completely skip agtype build functions and return vertex/edge for pure read queries. - Change _label_name to return agtype since record comparisons are not allowed with cstring. Consequently, _agtype_build_vertex/edge now accept agtype as label. - Fix MERGE clause type mismatch when accessing properties from previous MATCH clauses by wrapping columns with agtype_volatile_wrapper before namespace lookup. - Update expression index in pgvector.sql, since now it uses raw properties column instead of _agtype_build_vertex/edge. - Add regression tests Assisted-by AI
708291b to
8d50194
Compare
vertex: (id, label, properties)
edge: (id, label, end_id, start_id, properties)
Assisted-by AI