Skip to content

PushPredicateThroughNonJoin assertion failure when pushing filter through Project into Union (Hive view) #55575

@arthurli-dotcom

Description

@arthurli-dotcom

Describe the bug

PushPredicateThroughNonJoin crashes with AssertionError when a filter references a passthrough column (not aliased) on a CTE/subquery that reads from a Hive view backed by a UNION ALL.

Reproduction

Given a Hive view my_view defined as SELECT * FROM t1 UNION ALL SELECT * FROM t2 UNION ALL SELECT * FROM t3:

-- Crashes
WITH cte AS (
  SELECT id, name, col1['key'] AS col1_alias
  FROM my_view
  WHERE ds = '2024-01-01'
)
SELECT * FROM cte WHERE name = 'foo';
-- Works (filter directly on the view, no CTE/Project in between)
SELECT * FROM my_view WHERE name = 'foo' AND ds = '2024-01-01';
-- Works (filter inside the CTE, below the Project, directly above Union)
WITH cte AS (
  SELECT id, name, col1['key'] AS col1_alias
  FROM my_view
  WHERE ds = '2024-01-01'
    AND name = 'foo'
)
SELECT * FROM cte;

Root Cause

The rule applies in two iterations via plan transform applyLocally:

Iteration 1 — Filter + Project case (Optimizer.scala ~line 1724):

case Filter(condition, project @ Project(fields, grandChild)) =>
  val aliasMap = getAliasMap(project)
  project.copy(child = Filter(replaceAlias(condition, aliasMap), grandChild))

Pushes name = 'foo' below the Project. Since name is a passthrough column (not in aliasMap), replaceAlias returns the attribute with its original exprId from the Project output context (name#X).

Result: Project(..., Filter(name#X = 'foo', Union))

Iteration 2 — Filter + Union case (~line 1783):

case filter @ Filter(condition, union: Union) =>
  val output = union.output
  val newGrandChildren = union.children.map { grandchild =>
    val newCond = pushDownCond transform {
      case e if output.exists(_.semanticEquals(e)) =>
        grandchild.output(output.indexWhere(_.semanticEquals(e)))
    }
    assert(newCond.references.subsetOf(grandchild.outputSet))  // FAILS
    Filter(newCond, grandchild)
  }

The Union's output has name#Y (derived from firstAttr.exprId in Union.output). semanticEquals compares exprIds, so name#X does not match name#Y. The attribute is never remapped, and the assertion fails.

The exprId mismatch occurs because when a Hive view is expanded, the view resolution layer assigns new exprIds to the Union's output attributes that differ from the exprIds in the Project's output references.

Workaround

Move the filter into the CTE/subquery so it sits directly above the Union, skipping the Filter → Project → Union code path.

Stack Trace

java.lang.AssertionError: assertion failed
  at scala.Predef$.assert(Predef.scala:208)
  at o.a.s.sql.catalyst.optimizer.PushPredicateThroughNonJoin$$anonfun$7.$anonfun$applyOrElse$55(Optimizer.scala:2040)
  at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
  at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
  at scala.collection.TraversableLike.map(TraversableLike.scala:286)
  at o.a.s.sql.catalyst.optimizer.PushPredicateThroughNonJoin$$anonfun$7.applyOrElse(Optimizer.scala:2035)
  at o.a.s.sql.catalyst.optimizer.PushPredicateThroughNonJoin$$anonfun$7.applyOrElse(Optimizer.scala:1962)
  at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:175)
  at o.a.s.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:521)
  at o.a.s.sql.catalyst.optimizer.PushDownPredicates$.apply(Optimizer.scala:1948)

Environment

  • Spark version: 3.5.5 (AMZ fork 3.5.5-amzn-1)
  • Java version: JDK 17
  • Scala version: 2.12.18
  • Running on EMR with Hive metastore and Iceberg tables behind the UNION view

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions