Skip to content

Add vertex and edge composite types with direct field access optimization#2303

Merged
jrgemignani merged 1 commit into
apache:masterfrom
MuhammadTahaNaveed:columns-opt
Jun 1, 2026
Merged

Add vertex and edge composite types with direct field access optimization#2303
jrgemignani merged 1 commit into
apache:masterfrom
MuhammadTahaNaveed:columns-opt

Conversation

@MuhammadTahaNaveed
Copy link
Copy Markdown
Member

@MuhammadTahaNaveed MuhammadTahaNaveed commented Jan 10, 2026

  • Introduce vertex and edge as pg composite types
    vertex: (id, label, properties)
    edge: (id, label, end_id, start_id, properties)
  • Property access (a.name) now directly uses a.properties for agtype_access_operator instead of rebuilding via _agtype_build_vertex/edge
  • Optimize accessor functions (id, properties, label, type, start_id, end_id) to use direct FieldSelect on composite types instead of agtype functions
  • Add casts: vertex/edge to agtype, vertex/edge to json/jsonb
  • Add eq and not eq ops for vertex/edge composite types.
  • Fix label_name specific routine to use cache instead of ag_label scan
  • Write/update clauses have executors strictly tied to agtype, due to which the variables after any write/update clause are carried forward as agtype.
  • Allows users to completely skip agtype build functions and return vertex/edge for pure read queries.
  • Change _label_name to return agtype since record comparisons are not allowed with cstring. Consequently, _agtype_build_vertex/edge now accept agtype as label.
  • Fix MERGE clause type mismatch when accessing properties from previous MATCH clauses by wrapping columns with agtype_volatile_wrapper before namespace lookup.
  • Update expression index in pgvector.sql, since now it uses raw properties column instead of _agtype_build_vertex/edge.
  • Add regression tests

Assisted-by AI

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request introduces vertex and edge as PostgreSQL composite types to optimize graph query performance. The changes enable direct field access for properties and accessor functions instead of rebuilding agtype structures.

Changes:

  • Introduces vertex (id, label, properties) and edge (id, label, start_id, end_id, properties) composite types
  • Optimizes property access to use direct field selection instead of agtype rebuild
  • Changes _label_name to return agtype instead of cstring for type consistency
  • Adds implicit casts from vertex/edge to agtype and json
  • Fixes MERGE clause type mismatches with volatile wrapper

Reviewed changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
src/include/utils/agtype.h Added OID accessors for VERTEXOID and EDGEOID composite types
src/include/parser/cypher_expr.h Added helper functions for entity type checking and field extraction
src/include/parser/cypher_clause.h Added field access helper functions for composite types
src/include/catalog/ag_label.h Added get_label_name function signature
src/backend/utils/adt/agtype.c Implemented composite type OID caching, vertex/edge to agtype casts, label parameter type change
src/backend/parser/cypher_expr.c Implemented accessor function optimization and entity type coercion
src/backend/parser/cypher_clause.c Updated entity expression builders to use RowExpr, added field selection helpers
src/backend/executor/*.c Updated executors to use agtype labels instead of cstrings
src/backend/catalog/ag_label.c Changed _label_name to return agtype, added cache-based get_label_name
sql/agtype_graphid.sql Defined vertex/edge composite types and cast functions
sql/age_scalar.sql Added _get_vertex_by_graphid helper function
sql/age_main.sql Removed duplicate _label_name definition
regress/sql/*.sql Added comprehensive tests for accessor optimizations and composite types

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread regress/sql/expr.sql
Comment thread src/backend/utils/adt/agtype.c
Comment thread src/backend/utils/adt/agtype.c
Comment thread src/backend/utils/adt/agtype.c
Comment thread src/backend/parser/cypher_expr.c
Comment thread src/backend/parser/cypher_expr.c
Comment thread regress/sql/expr.sql Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 23 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

src/backend/utils/adt/agtype.c:2394

  • Potential memory leak: pnstrdup allocates memory for label at line 2377, but if an error occurs after this point (e.g., at line 2391 when properties validation fails), the allocated memory for label is not freed before the error is thrown. Consider using a PG_TRY/PG_CATCH block or allocating in the current memory context that will be cleaned up on error.
    label = pnstrdup(label_value->val.string.val, label_value->val.string.len);

    if (fcinfo->args[2].isnull)
    {
        bstate = init_agtype_build_state(0, AGT_FOBJECT);
        properties = build_agtype(bstate);
        pfree_agtype_build_state(bstate);
    }
    else
    {
        properties = AG_GET_ARG_AGTYPE_P(2);

        if (!AGT_ROOT_IS_OBJECT(properties))
        {
            ereport(ERROR,
                    (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                     errmsg("_agtype_build_vertex() properties argument must be an object")));
        }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/backend/parser/cypher_expr.c
Comment thread src/backend/parser/cypher_expr.c
Comment thread sql/age_scalar.sql
@jrgemignani
Copy link
Copy Markdown
Contributor

jrgemignani commented Jan 13, 2026

@MuhammadTahaNaveed Cursory thoughts so far -

  1. There aren't any additions to the age--1.6.0--y.y.y.sql file which is needed for upgrading. I won't be able to test actual performance without this file.
  2. These changes to the types will generally make it difficult to upgrade.
  3. The composite vertex and edge types are an interesting touch, but, it doesn't appear that you removed how the vertices and edges are represented in agtype. If this is intended, won't this increase the space taken by these objects?

@jrgemignani
Copy link
Copy Markdown
Contributor

jrgemignani commented Jan 13, 2026

@MuhammadTahaNaveed Ambiguous operator added -

Current master branch

psql-17.4-5432-pgsql=#  SELECT create_graph('bug_demo');
NOTICE:  graph "bug_demo" has been created
 create_graph
--------------

(1 row)

psql-17.4-5432-pgsql=# SELECT * FROM cypher('bug_demo', $$ CREATE (:Person {name: 'Alice'}) $$) AS (v agtype);
 v
---
(0 rows)

psql-17.4-5432-pgsql=# SELECT * FROM cypher('bug_demo', $$ CREATE (:Person {name: 'Bob'}) $$) AS (v agtype);
 v
---
(0 rows)

psql-17.4-5432-pgsql=# SELECT * FROM cypher('bug_demo', $$
psql-17.4-5432-pgsql$#   MATCH (a:Person), (b:Person)
psql-17.4-5432-pgsql$#   WHERE a IN [a, b]
psql-17.4-5432-pgsql$#   RETURN a.name
psql-17.4-5432-pgsql$# $$) AS (name agtype);
  name
---------
 "Alice"
 "Alice"
 "Bob"
 "Bob"
(4 rows)

PR 2303

psql-17.7-5432-pgsql=# SELECT create_graph('bug_demo');
SELECT * FROM cypher('bug_demo', $$ CREATE (:Person {name: 'Alice'}) $$) AS (v agtype);
SELECT * FROM cypher('bug_demo', $$ CREATE (:Person {name: 'Bob'}) $$) AS (v agtype);
SELECT * FROM cypher('bug_demo', $$
  MATCH (a:Person), (b:Person)
  WHERE a IN [a, b]
  RETURN a.name
$$) AS (name agtype);
NOTICE:  graph "bug_demo" has been created
 create_graph 
--------------
 
(1 row)

 v 
---
(0 rows)

 v 
---
(0 rows)

ERROR:  operator is not unique: vertex = vertex
LINE 3:   WHERE a IN [a, b]
                  ^
HINT:  Could not choose a best candidate operator. You might need to add explicit type casts.

Edge too

psql-17.7-5432-pgsql=# SELECT * FROM cypher('bug_demo', $$
  MATCH ()-[r1]->(), ()-[r2]->()
  WHERE r1 IN [r1, r2]
  RETURN id(r1)
$$) AS (id agtype);
ERROR:  operator is not unique: edge = edge
LINE 3:   WHERE r1 IN [r1, r2]
                   ^
HINT:  Could not choose a best candidate operator. You might need to add explicit type casts.
psql-17.7-5432-pgsql=#

@MuhammadTahaNaveed
Copy link
Copy Markdown
Member Author

@jrgemignani

  1. There aren't any additions to the age--1.6.0--y.y.y.sql file which is needed for upgrading. I won't be able to test actual performance without this file.

Updated age--1.6.0--y.y.y.sql

  1. These changes to the types will generally make it difficult to upgrade.

The changed functions (_label_name, _agtype_build_vertex, _agtype_build_edge) are intended for internal use only. They are invoked by the parser/executor and are not meant to be called directly by users.

The upgrade script in age--1.6.0--y.y.y.sql manages the transition by dropping the old function signatures and creating the new ones. No data migration is required, as the underlying table structure remains unchanged.

From a user perspective, existing queries that return agtype continue to work without any changes. The composite types are introduced to improve internal query transformation and serve as an additional output option, rather than a breaking change.

  1. The composite vertex and edge types are an interesting touch, but, it doesn't appear that you removed how the vertices and edges are represented in agtype. If this is intended, won't this increase the space taken by these objects?

The composite types don’t increase storage, as they are only a query-time representation. The underlying tables continue to store the same data as before.

The agtype representation is still required for paths (which are agtype arrays of vertices/edges), for storing entities in agtype containers, and for compatibility with existing agtype operators. When agtype is needed, the implicit casts(backed by functions vertex_to_agtype and edge_to_agtype) handle the conversion.

Ambiguous operator added

Good catch. Fixed it.

@MuhammadTahaNaveed MuhammadTahaNaveed force-pushed the columns-opt branch 3 times, most recently from 6c4402a to 001a349 Compare June 1, 2026 16:09
@jrgemignani
Copy link
Copy Markdown
Contributor

Review: Logic soundness + regression-test validation

Verdict: logic is sound, and all regression-test changes are legitimate (no masked correctness regressions).

This PR replaces the _agtype_build_vertex/edge(...) construction path with native composite types (vertex, edge) plus implicit casts to agtype/json/jsonb, and adds a parser-level accessor optimization that lowers id()/properties()/label()/type()/start_id()/end_id()/startNode()/endNode() to direct FieldSelect nodes instead of building and re-parsing agtype.

Logic soundness

The one structurally risky element — the edge composite field order (id, label, end_id, start_id, properties), with end_id before start_id — is handled correctly and consistently across all four sites that must agree:

Site Mapping
SQL composite definition end_id = attr 3, start_id = attr 4
edge_to_agtype / edge_to_json_string (agtype.c) reads values[2]→end_id, values[3]→start_id, calls make_edge(id, start_id, end_id, …)
get_record_field_info (cypher_clause.c) endnode/end_id→field 3, startnode/start_id→field 4 ✓
make_edge_expr RowExpr list_make5(id, label, end_id, start_id, props) + matching colnames ✓

This is empirically confirmed by the new EXPLAIN tests: start_id(r)Output: (r.start_id)::agtype and end_id(r)Output: (r.end_id)::agtype. No swap.

Other verified items:

  • _agtype_build_vertex/edge label param cstring→agtype: validates the value is a scalar string, errors otherwise; NULL handling intact; memory freed correctly.
  • get_label_name(label_id, graph_oid) reads the existing label cache instead of a fresh systable scan (the perf win); returns cache memory (callers must not pfree).
  • vertex_eq/edge_eq compare by id only — correct graph-entity identity semantics. No HASHES/MERGES declared, so no hash/merge-join hazard.
  • optimize_accessor_function: the uninitialized-result concern is not exploitable — startNode/endNode on a vertex yields InvalidAttrNumber, which hits the else ereport(ERROR) (correct "must be an edge" message), so result is never read uninitialized.
  • agtype_volatile_wrapper correctly extended for GRAPHIDOID, VERTEXOID, EDGEOID.
  • agtype_categorize_type / datum_to_agtype correctly route composites through vertex_to_agtype/edge_to_agtype when boxing into maps/lists.

Two non-blocking findings:

  1. _get_vertex_by_graphid has no REVOKE … FROM PUBLIC. Largely mitigated because the underlying get_vertex performs pg_class_aclcheck(… ACL_SELECT), so a caller lacking SELECT on the label table errors out. It still bypasses RLS row policies, though AGE label tables don't use RLS. Adding the REVOKE would be defense-in-depth.
  2. OID-dependent error text: reverse() unsupported argument type 16843 embeds the dynamic VERTEXOID. Deterministic in a fresh regress DB, but brittle if catalog OIDs shift.

Regression-test classification

File Δ Classification Verdict
cypher_match.out 6 reorder only Same 6 (and 2) rows; only tie order changed. All ids present before & after. No ORDER BY. Benign.
pgvector.out 4 intentional mod (matches .sql) HNSW index expression now indexes the raw properties column instead of _agtype_build_vertex(...). Semantically equivalent. Sound.
agtype.out 92 forced input mod (matches .sql) Labels changed $$x$$'"x"' for the cstring→agtype signature. Result rows byte-identical. No behavior change.
expr.out (early hunks) ~16 removed message mods + reorder Error text changes (still error, same semantics, now with LINE … ^ pointer); 6 case_statement hunks are reorder-only (same 6 vertices; row carrying the real computed value keeps its 1/6). Benign.
expr.out (final hunk) +1099 additive New EXPLAIN/accessor coverage demonstrating the optimization.
cypher_with.out / .sql +312 / +143 additive New feature coverage, zero deletions.

Reorder vetting: every reordering-only change preserves the complete row set (all ids present before and after) and value-bearing rows retain their values. The reordering is a benign consequence of vertices/edges now being composite RowExprs (different scan/aggregate ordering) — not a masked add, drop, or alteration of rows.

Error-message modifications all keep the same outcome (the query still errors with the same semantics), e.g. cannot cast agtype vertex to type intcannot cast type vertex to bigint for column "i" (PG's native composite-coercion rejection), and accessor errors now carry an added LINE n: … ^ position from parser_errposition.

Summary

The core C logic is correct — including the bug-prone end_id/start_id composite ordering, which is consistently respected everywhere and proven by the new EXPLAIN tests. No regression-test change masks a correctness defect: changes are either (a) forced input adaptations with identical outputs, (b) intentional documented optimizations, (c) benign output reordering with full row-set preservation, or (d) purely additive new coverage. The only non-blocking nits are the missing REVOKE on _get_vertex_by_graphid (mitigated by an existing ACL check) and one OID-dependent error string.

@MuhammadTahaNaveed MuhammadTahaNaveed force-pushed the columns-opt branch 2 times, most recently from 04dc881 to 708291b Compare June 1, 2026 16:31
@MuhammadTahaNaveed
Copy link
Copy Markdown
Member Author

@jrgemignani

  1. _get_vertex_by_graphid has no REVOKE … FROM PUBLIC. Largely mitigated because the underlying get_vertex performs pg_class_aclcheck(… ACL_SELECT), so a caller lacking SELECT on the label table errors out. It still bypasses RLS row policies, though AGE label tables don't use RLS. Adding the REVOKE would be defense-in-depth.
  2. OID-dependent error text: reverse() unsupported argument type 16843 embeds the dynamic VERTEXOID. Deterministic in a fresh regress DB, but brittle if catalog OIDs shift.

Addressed both.

The one structurally risky element — the edge composite field order (id, label, end_id, start_id, properties), with end_id before start_id

end_id precedes start_id intentionally, it matches the agtype edge's canonical key order (keys sorted by length, end_id=6 < start_id=8), keeping the composite positionally consistent with the agtype representation.

@jrgemignani
Copy link
Copy Markdown
Contributor

@MuhammadTahaNaveed One last Copilot review for sanity. Unless it has anything worth fixing, I will merge it.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 24 out of 24 changed files in this pull request and generated 18 comments.

Comment thread src/backend/utils/adt/agtype.c
Comment thread src/backend/utils/adt/agtype.c
Comment thread src/include/utils/agtype.h
Comment thread sql/agtype_graphid.sql
Comment thread sql/agtype_graphid.sql
Comment thread src/backend/utils/adt/agtype.c
Comment thread src/include/utils/agtype.h
Comment thread sql/agtype_graphid.sql
Comment thread sql/agtype_graphid.sql
Comment thread sql/age_scalar.sql
…tion

- Introduce vertex and edge as pg composite types
  vertex: (id, label, properties)
  edge: (id, label, start_id, end_id, properties)
- end_id precedes start_id intentionally, it matches the agtype edge's
  canonical key order (keys sorted by length, end_id=6 < start_id=8),
  keeping the composite positionally consistent with the agtype
  representation.
- Property access (a.name) now directly uses a.properties for
  agtype_access_operator instead of rebuilding via _agtype_build_vertex/edge
- Optimize accessor functions (id, properties, label, type, start_id, end_id)
  to use direct FieldSelect on composite types instead of agtype functions
- Add casts: vertex/edge to agtype, vertex/edge to json/jsonb
- Add eq and not eq ops for vertex/edge composite types.
- Fix label_name specific routine to use cache instead of ag_label scan
- Write/update clauses have executors strictly tied to agtype, due to which
  the variables after any write/update clause are carried forward as agtype.
- Allows users to completely skip agtype build functions and return vertex/edge
  for pure read queries.
- Change _label_name to return agtype since record comparisons are not
  allowed with cstring. Consequently, _agtype_build_vertex/edge now accept
  agtype as label.
- Fix MERGE clause type mismatch when accessing properties from previous
  MATCH clauses by wrapping columns with agtype_volatile_wrapper
  before namespace lookup.
- Update expression index in pgvector.sql, since now it uses raw properties column
  instead of _agtype_build_vertex/edge.
- Add regression tests

Assisted-by AI
@jrgemignani jrgemignani merged commit 5ef7d6d into apache:master Jun 1, 2026
6 checks passed
@MuhammadTahaNaveed MuhammadTahaNaveed deleted the columns-opt branch June 2, 2026 13:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

master override-stale To keep issues/PRs untouched from stale action

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants