Skip to content

[spark] Reduce paimon-spark-4.0 shadows via SparkShim copy factories#7721

Open
kerwin-zk wants to merge 1 commit intoapache:masterfrom
kerwin-zk:spark-shim-cleanup
Open

[spark] Reduce paimon-spark-4.0 shadows via SparkShim copy factories#7721
kerwin-zk wants to merge 1 commit intoapache:masterfrom
kerwin-zk:spark-shim-cleanup

Conversation

@kerwin-zk
Copy link
Copy Markdown
Contributor

@kerwin-zk kerwin-zk commented Apr 28, 2026

Purpose

Follow-up to #7648 (Spark 4.1 module). After the reverse-shim layout landed, three of the files copied into paimon-spark-4.0/src/main only differed across versions because of case class .copy(...) calls on Spark types whose arity changed between 4.0.2 and 4.1.1:

  • DataSourceV2Relation gained Option[TimeTravelSpec] (8 → 9 fields) — relation.copy(table = ...) compiled against 4.1.1 emits copy\$default\$9, which crashes on 4.0 with NoSuchMethodError.
  • TableSpec gained Seq[Constraint] (8 → 9 fields) — same problem for spec.copy(location = ...) and spec.copy(properties = ...).

Per-version scalac is the only thing that knows the right copy\$default\$N to emit, so we route those three calls through new SparkShim factories (one per call site). The implementations live in Spark3Shim / Spark4Shim (plus the 4.0 override), and the cross-version source files no longer need to be physically duplicated.

Tests

CI

API and Format

No new public API. Three internal helper methods added to org.apache.spark.sql.paimon.shims.SparkShim:

  • copyDataSourceV2Relation(relation, newTable)
  • copyTableSpecLocation(spec, location)
  • copyTableSpecProperties(spec, properties)

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces Spark 4.0/4.1 source duplication by routing certain case class .copy(...) operations (on Spark types whose constructor arity changed between Spark 4.0.2 and 4.1.1) through per-version SparkShim factory methods, preventing NoSuchMethodError at runtime on Spark 4.0.

Changes:

  • Add new internal SparkShim factory methods to copy DataSourceV2Relation / TableSpec safely across Spark 4.0 vs 4.1.
  • Update shared Spark-common call sites to use SparkShimLoader.shim instead of direct .copy(...).
  • Remove now-unnecessary Spark 4.0 module source copies for the affected files.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
paimon-spark/paimon-spark4-common/src/main/scala/org/apache/spark/sql/paimon/shims/Spark4Shim.scala Implements new shim copy factories for Spark 4.x.
paimon-spark/paimon-spark3-common/src/main/scala/org/apache/spark/sql/paimon/shims/Spark3Shim.scala Implements new shim copy factories for Spark 3.x.
paimon-spark/paimon-spark-common/src/main/scala/org/apache/spark/sql/paimon/shims/SparkShim.scala Adds new SparkShim abstract factory methods and documents the Spark 4.0/4.1 arity issue.
paimon-spark/paimon-spark-common/src/main/scala/org/apache/spark/sql/execution/shim/PaimonCreateTableAsSelectStrategy.scala Uses shim factory to update TableSpec.properties safely.
paimon-spark/paimon-spark-common/src/main/scala/org/apache/spark/sql/execution/PaimonStrategyHelper.scala Uses shim factory to update TableSpec.location safely.
paimon-spark/paimon-spark-common/src/main/scala/org/apache/paimon/spark/util/ScanPlanHelper.scala Uses shim factory to update DataSourceV2Relation.table safely.
paimon-spark/paimon-spark-4.0/src/main/scala/org/apache/spark/sql/paimon/shims/Spark4Shim.scala Adds Spark 4.0 override implementations for the new shim methods.
paimon-spark/paimon-spark-4.0/src/main/scala/org/apache/spark/sql/execution/shim/PaimonCreateTableAsSelectStrategy.scala Deleted: removes Spark 4.0-specific copy now handled in shared code + shim.
paimon-spark/paimon-spark-4.0/src/main/scala/org/apache/spark/sql/execution/PaimonStrategyHelper.scala Deleted: removes Spark 4.0-specific copy now handled in shared code + shim.
paimon-spark/paimon-spark-4.0/src/main/scala/org/apache/paimon/spark/util/ScanPlanHelper.scala Deleted: removes Spark 4.0-specific copy now handled in shared code + shim.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +70 to +79
* Returns a `DataSourceV2Relation` like `relation` but with `table` replaced. Spark 4.1 added
* `Option[TimeTravelSpec]` as the 6th field of `DataSourceV2Relation`, so a `relation.copy(table
* = ...)` call compiled against 4.1.1 emits a `copy$default$6` reference that crashes with
* `NoSuchMethodError` on Spark 4.0 runtime. Routing through this factory lets each per-version
* SparkShim implementation generate the matching copy bytecode.
*/
def copyDataSourceV2Relation(
relation: DataSourceV2Relation,
newTable: Table): DataSourceV2Relation

Comment on lines +71 to +74
* `Option[TimeTravelSpec]` as the 6th field of `DataSourceV2Relation`, so a `relation.copy(table
* = ...)` call compiled against 4.1.1 emits a `copy$default$6` reference that crashes with
* `NoSuchMethodError` on Spark 4.0 runtime. Routing through this factory lets each per-version
* SparkShim implementation generate the matching copy bytecode.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants