Describe the bug
I have table t1 with a column called file_path
I want to obtain a list of file_paths where each element is unique and then take a random subset of those columns.
I thought that this could be achieved with the following code.
let files = ctx.sql("select file_path from t1 group by file_path").await.unwrap()
.with_column("r", random() ).unwrap()
.filter(col("r").lt_eq(lit(0.2))).unwrap();
files.show().await.unwrap();
However in the output of my query I see the following entries which contains a record that should be filtered out.
| A | 0.8023022275259943 |
| B | 0.05829777789599211 |
| C | 0.14330028518553894 |
This is the calculated logical plan
Projection: t1.file_path, random() AS r
Aggregate: groupBy=[[t1.file_path]], aggr=[[]]
Filter: random() <= Float64(0.2)
TableScan: t1 projection=[file_path]
In this case I would expect the filter to occur after the aggregate operation not before.
To Reproduce
No response
Expected behavior
No response
Additional context
No response
Describe the bug
I have table
t1with a column calledfile_pathI want to obtain a list of file_paths where each element is unique and then take a random subset of those columns.
I thought that this could be achieved with the following code.
However in the output of my query I see the following entries which contains a record that should be filtered out.
This is the calculated logical plan
In this case I would expect the filter to occur after the aggregate operation not before.
To Reproduce
No response
Expected behavior
No response
Additional context
No response