Skip to content

Dynamic Filter Pushdown causes JOIN to return incorrect results #17188

@nuno-faria

Description

@nuno-faria

Describe the bug

The Dynamic Filter Pushdown optimization (#16445) is causing joins to return incorrect results when the predicate has multiple conditions.

To Reproduce

copy (select i as k from generate_series(1, 10000000) as t(i)) to 't1.parquet';
copy (select i as k, i as v from generate_series(1, 10000000) as t(i)) to 't2.parquet';
create external table t1 stored as parquet location 't1.parquet';
create external table t2 stored as parquet location 't2.parquet';

In the following query, sometimes only v=1 is evaluated, while in others only v=10000000 is evaluated:

select *
from t1
join t2 on t1.k = t2.k
where v = 1 or v = 10000000;

We can see that the DynamicFilterPhysicalExpr can change:

+---+---+---+
| k | k | v |
+---+---+---+
| 1 | 1 | 1 |
+---+---+---+

predicate=DynamicFilterPhysicalExpr [ k@0 >= 1 AND k@0 <= 1 ]
+----------+----------+----------+
| k        | k        | v        |
+----------+----------+----------+
| 10000000 | 10000000 | 10000000 |
+----------+----------+----------+

predicate=DynamicFilterPhysicalExpr [ k@0 >= 10000000 AND k@0 <= 10000000 ]

If we use a smaller table, both v=1 and v=10000000 will be considered, so maybe the pushdown is not waiting until the filtered side is fully completed? Here is a bigger example to show how it changes:

copy (select i as k from generate_series(1, 10000000) as t(i)) to 't1.parquet';
copy (select i as k, i % 10000 as v from generate_series(1, 10000000) as t(i)) to 't2.parquet';

create external table t1 stored as parquet location 't1.parquet';
create external table t2 stored as parquet location 't2.parquet';

select *
from t1
join t2 on t1.k = t2.k
where v = 1 or v = 10;

And a few runs:

output_rows=1944
predicate=DynamicFilterPhysicalExpr [ k@0 >= 200010 AND k@0 <= 9910001 ]

output_rows=1990
predicate=DynamicFilterPhysicalExpr [ k@0 >= 30010 AND k@0 <= 9970010 ]

output_rows=1978
predicate=DynamicFilterPhysicalExpr [ k@0 >= 60010 AND k@0 <= 9920001 ]

Expected behavior

Return the correct results.

Additional context

Disabling datafusion.optimizer.enable_dynamic_filter_pushdown returns the correct results.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions