Skip to content

feat: add from_utc_timestamp and to_utc_timestamp expressions#4308

Open
andygrove wants to merge 2 commits into
apache:mainfrom
andygrove:feat/from-utc-timestamp
Open

feat: add from_utc_timestamp and to_utc_timestamp expressions#4308
andygrove wants to merge 2 commits into
apache:mainfrom
andygrove:feat/from-utc-timestamp

Conversation

@andygrove
Copy link
Copy Markdown
Member

@andygrove andygrove commented May 12, 2026

Which issue does this PR close?

Closes #2013.

Rationale for this change

from_utc_timestamp and to_utc_timestamp are commonly used Spark datetime functions. Both extend the same UTCTimestamp Catalyst trait and have matching SparkFromUtcTimestamp / SparkToUtcTimestamp implementations in the upstream datafusion-spark crate, so wiring them through gives Comet acceleration with minimal native code.

What changes are included in this PR?

  • Add CometFromUTCTimestamp and CometToUTCTimestamp serdes in spark/src/main/scala/org/apache/comet/serde/datetime.scala and register them in QueryPlanSerde's temporalExpressions map. The shared incompat reason lives on a private UTCTimestampSerde helper.
  • Register the upstream SparkFromUtcTimestamp and SparkToUtcTimestamp UDFs in native/core/src/execution/jni_api.rs::register_datafusion_spark_function.
  • Override getIncompatibleReasons on both serdes to document the one known divergence: arrow's Tz parser does not accept Spark's legacy timezone forms (GMT+1, UTC+1, PST and similar). Such timezones surface a native parse error rather than a silent wrong result. IANA names and fixed offsets (+HH:MM) are fully supported.
  • Add Comet SQL Tests at spark/src/test/resources/sql-tests/expressions/datetime/from_utc_timestamp.sql and to_utc_timestamp.sql. Both cover column and literal arguments for the timestamp and timezone operands, IANA names, fixed offsets, summer and winter DST rows for America/Los_Angeles, and null handling, under a ConfigMatrix: spark.sql.session.timeZone=UTC,America/Los_Angeles to verify the result is independent of session timezone.
  • Update docs/source/contributor-guide/spark_expressions_support.md to mark both functions supported and record dated audit notes for Spark 3.4.3, 3.5.8, and 4.0.1.

Both expressions were scaffolded with the implement-comet-expression project skill, which also drove the audit-comet-expression follow-up that produced the test matrix above.

How are these changes tested?

  • ./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite from_utc_timestamp" -Dtest=none (passes both ConfigMatrix runs locally).
  • ./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite to_utc_timestamp" -Dtest=none (passes both ConfigMatrix runs locally).
  • cd native && cargo clippy --all-targets --workspace -- -D warnings passes clean.

Wires Spark FromUTCTimestamp to the upstream datafusion-spark
SparkFromUtcTimestamp UDF. Adds a Scala serde with a documented
incompatibility note for legacy timezone strings (GMT+1, UTC+1,
PST) that arrow's Tz parser does not accept, a SQL file test
covering IANA names, fixed offsets, both DST branches, and null
handling under a session-timezone ConfigMatrix, and updates the
expression support doc with dated audit notes for Spark 3.4.3,
3.5.8, and 4.0.1.
@andygrove andygrove marked this pull request as draft May 12, 2026 22:12
Wires Spark ToUTCTimestamp to the upstream datafusion-spark
SparkToUtcTimestamp UDF. Shares the legacy-timezone-form
incompatibility note with from_utc_timestamp via a private helper
object. Mirrors the SQL file test for from_utc_timestamp covering
IANA names, fixed offsets, both DST branches, nulls, and a
session-timezone ConfigMatrix.
@andygrove andygrove changed the title feat: add from_utc_timestamp expression feat: add from_utc_timestamp and to_utc_timestamp expressions May 12, 2026
@andygrove andygrove marked this pull request as ready for review May 13, 2026 03:30
@andygrove andygrove added this to the 0.17.0 (June 2026) milestone May 13, 2026
@andygrove andygrove moved this from Todo to In progress in Comet Development May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

Add from_utc_timestamp support

1 participant