Skip to content

Implement Expr SimplifyWithGuarantee #6171

@wjones127

Description

@wjones127

Is your feature request related to a problem or challenge?

We are starting to look for more advanced methods for filter pushdown. I was starting to think of porting SimplifyWithGuarantee. The critical functionality we are looking for is being able to evaluate a predicate against some statistics and get the residual expression. For example, if I have the predicate x = 1 AND y < 2:

  • file 1 with stats 0 <= x <= 20 and 0 < y <= 1 => residual filter x = 1 (y < 2 is always satisfied) => scan this file with x = 1 filter
  • file 2 with stats 3 < y <= 10 => residual filter false => don't scan this file since it will never satisfy the predicate

Describe the solution you'd like

I think a straightforward port of that function would be useful, but if there is a design that integrates better with existing functionality, I'm open to other designs.

/// Given a guarantee expression and a predicate expression, simplify the predicate expression.
/// 
/// # Example
/// 
/// This is useful for example when filtering data that has statistics. For example
/// if the statistics tell you `x > 2` (the guarantee), and you want to filter with
/// `x > 3 and y < 0`, then you can simplify the predicate to `y < 0`. Alternatively,
/// if the predicate is `x < 1 and y < 0`, then you know now directly from the 
/// statistics that the predicate will always be false, so your filter can 
/// immediately return an empty result.
/// 
/// ```
/// use datafusion_expr::{lit, col, Expr};
/// 
/// let guarantee = col("x") > lit(2);
/// let predicate = (col("x") > lit(3)) & (col("y") < lit(0));
/// assert_eq!(predicate.simplify_with_guarantee(guarantee), col("y") < lit(0));
/// 
/// let predicate = (col("x") < lit(1)) & (col("y") < lit(0));
/// assert_eq!(predicate.simplify_with_guarantee(guarantee), lit(false));
/// ```
pub fn simplify_with_guarantee(&self, guarantee: &Expr) -> Expr {
    todo!()
}

Describe alternatives you've considered

It seems like the current solutions with PruningPredicate don't give you the residual expression.

Additional context

This is related to #5830

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions