Is your feature request related to a problem or challenge?
We are starting to look for more advanced methods for filter pushdown. I was starting to think of porting SimplifyWithGuarantee. The critical functionality we are looking for is being able to evaluate a predicate against some statistics and get the residual expression. For example, if I have the predicate x = 1 AND y < 2:
- file 1 with stats
0 <= x <= 20 and 0 < y <= 1 => residual filter x = 1 (y < 2 is always satisfied) => scan this file with x = 1 filter
- file 2 with stats
3 < y <= 10 => residual filter false => don't scan this file since it will never satisfy the predicate
Describe the solution you'd like
I think a straightforward port of that function would be useful, but if there is a design that integrates better with existing functionality, I'm open to other designs.
/// Given a guarantee expression and a predicate expression, simplify the predicate expression.
///
/// # Example
///
/// This is useful for example when filtering data that has statistics. For example
/// if the statistics tell you `x > 2` (the guarantee), and you want to filter with
/// `x > 3 and y < 0`, then you can simplify the predicate to `y < 0`. Alternatively,
/// if the predicate is `x < 1 and y < 0`, then you know now directly from the
/// statistics that the predicate will always be false, so your filter can
/// immediately return an empty result.
///
/// ```
/// use datafusion_expr::{lit, col, Expr};
///
/// let guarantee = col("x") > lit(2);
/// let predicate = (col("x") > lit(3)) & (col("y") < lit(0));
/// assert_eq!(predicate.simplify_with_guarantee(guarantee), col("y") < lit(0));
///
/// let predicate = (col("x") < lit(1)) & (col("y") < lit(0));
/// assert_eq!(predicate.simplify_with_guarantee(guarantee), lit(false));
/// ```
pub fn simplify_with_guarantee(&self, guarantee: &Expr) -> Expr {
todo!()
}
Describe alternatives you've considered
It seems like the current solutions with PruningPredicate don't give you the residual expression.
Additional context
This is related to #5830
Is your feature request related to a problem or challenge?
We are starting to look for more advanced methods for filter pushdown. I was starting to think of porting SimplifyWithGuarantee. The critical functionality we are looking for is being able to evaluate a predicate against some statistics and get the residual expression. For example, if I have the predicate
x = 1 AND y < 2:0 <= x <= 20and0 < y <= 1=> residual filterx = 1(y < 2is always satisfied) => scan this file withx = 1filter3 < y <= 10=> residual filterfalse=> don't scan this file since it will never satisfy the predicateDescribe the solution you'd like
I think a straightforward port of that function would be useful, but if there is a design that integrates better with existing functionality, I'm open to other designs.
Describe alternatives you've considered
It seems like the current solutions with
PruningPredicatedon't give you the residual expression.Additional context
This is related to #5830