feat: add from_utc_timestamp and to_utc_timestamp expressions#4308
Open
andygrove wants to merge 2 commits into
Open
feat: add from_utc_timestamp and to_utc_timestamp expressions#4308andygrove wants to merge 2 commits into
andygrove wants to merge 2 commits into
Conversation
Wires Spark FromUTCTimestamp to the upstream datafusion-spark SparkFromUtcTimestamp UDF. Adds a Scala serde with a documented incompatibility note for legacy timezone strings (GMT+1, UTC+1, PST) that arrow's Tz parser does not accept, a SQL file test covering IANA names, fixed offsets, both DST branches, and null handling under a session-timezone ConfigMatrix, and updates the expression support doc with dated audit notes for Spark 3.4.3, 3.5.8, and 4.0.1.
Wires Spark ToUTCTimestamp to the upstream datafusion-spark SparkToUtcTimestamp UDF. Shares the legacy-timezone-form incompatibility note with from_utc_timestamp via a private helper object. Mirrors the SQL file test for from_utc_timestamp covering IANA names, fixed offsets, both DST branches, nulls, and a session-timezone ConfigMatrix.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #2013.
Rationale for this change
from_utc_timestampandto_utc_timestampare commonly used Spark datetime functions. Both extend the sameUTCTimestampCatalyst trait and have matchingSparkFromUtcTimestamp/SparkToUtcTimestampimplementations in the upstreamdatafusion-sparkcrate, so wiring them through gives Comet acceleration with minimal native code.What changes are included in this PR?
CometFromUTCTimestampandCometToUTCTimestampserdes inspark/src/main/scala/org/apache/comet/serde/datetime.scalaand register them inQueryPlanSerde'stemporalExpressionsmap. The shared incompat reason lives on a privateUTCTimestampSerdehelper.SparkFromUtcTimestampandSparkToUtcTimestampUDFs innative/core/src/execution/jni_api.rs::register_datafusion_spark_function.getIncompatibleReasonson both serdes to document the one known divergence: arrow'sTzparser does not accept Spark's legacy timezone forms (GMT+1,UTC+1,PSTand similar). Such timezones surface a native parse error rather than a silent wrong result. IANA names and fixed offsets (+HH:MM) are fully supported.spark/src/test/resources/sql-tests/expressions/datetime/from_utc_timestamp.sqlandto_utc_timestamp.sql. Both cover column and literal arguments for the timestamp and timezone operands, IANA names, fixed offsets, summer and winter DST rows forAmerica/Los_Angeles, and null handling, under aConfigMatrix: spark.sql.session.timeZone=UTC,America/Los_Angelesto verify the result is independent of session timezone.docs/source/contributor-guide/spark_expressions_support.mdto mark both functions supported and record dated audit notes for Spark 3.4.3, 3.5.8, and 4.0.1.Both expressions were scaffolded with the
implement-comet-expressionproject skill, which also drove theaudit-comet-expressionfollow-up that produced the test matrix above.How are these changes tested?
./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite from_utc_timestamp" -Dtest=none(passes bothConfigMatrixruns locally)../mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite to_utc_timestamp" -Dtest=none(passes bothConfigMatrixruns locally).cd native && cargo clippy --all-targets --workspace -- -D warningspasses clean.