Describe the bug
The Dynamic Filter Pushdown optimization (#16445) is causing joins to return incorrect results when the predicate has multiple conditions.
To Reproduce
copy (select i as k from generate_series(1, 10000000) as t(i)) to 't1.parquet';
copy (select i as k, i as v from generate_series(1, 10000000) as t(i)) to 't2.parquet';
create external table t1 stored as parquet location 't1.parquet';
create external table t2 stored as parquet location 't2.parquet';
In the following query, sometimes only v=1 is evaluated, while in others only v=10000000 is evaluated:
select *
from t1
join t2 on t1.k = t2.k
where v = 1 or v = 10000000;
We can see that the DynamicFilterPhysicalExpr can change:
+---+---+---+
| k | k | v |
+---+---+---+
| 1 | 1 | 1 |
+---+---+---+
predicate=DynamicFilterPhysicalExpr [ k@0 >= 1 AND k@0 <= 1 ]
+----------+----------+----------+
| k | k | v |
+----------+----------+----------+
| 10000000 | 10000000 | 10000000 |
+----------+----------+----------+
predicate=DynamicFilterPhysicalExpr [ k@0 >= 10000000 AND k@0 <= 10000000 ]
If we use a smaller table, both v=1 and v=10000000 will be considered, so maybe the pushdown is not waiting until the filtered side is fully completed? Here is a bigger example to show how it changes:
copy (select i as k from generate_series(1, 10000000) as t(i)) to 't1.parquet';
copy (select i as k, i % 10000 as v from generate_series(1, 10000000) as t(i)) to 't2.parquet';
create external table t1 stored as parquet location 't1.parquet';
create external table t2 stored as parquet location 't2.parquet';
select *
from t1
join t2 on t1.k = t2.k
where v = 1 or v = 10;
And a few runs:
output_rows=1944
predicate=DynamicFilterPhysicalExpr [ k@0 >= 200010 AND k@0 <= 9910001 ]
output_rows=1990
predicate=DynamicFilterPhysicalExpr [ k@0 >= 30010 AND k@0 <= 9970010 ]
output_rows=1978
predicate=DynamicFilterPhysicalExpr [ k@0 >= 60010 AND k@0 <= 9920001 ]
Expected behavior
Return the correct results.
Additional context
Disabling datafusion.optimizer.enable_dynamic_filter_pushdown returns the correct results.
Describe the bug
The Dynamic Filter Pushdown optimization (#16445) is causing joins to return incorrect results when the predicate has multiple conditions.
To Reproduce
In the following query, sometimes only
v=1is evaluated, while in others onlyv=10000000is evaluated:We can see that the
DynamicFilterPhysicalExprcan change:If we use a smaller table, both
v=1andv=10000000will be considered, so maybe the pushdown is not waiting until the filtered side is fully completed? Here is a bigger example to show how it changes:And a few runs:
Expected behavior
Return the correct results.
Additional context
Disabling
datafusion.optimizer.enable_dynamic_filter_pushdownreturns the correct results.