Skip to content

Support no distinct count/max/min/sum aggregate in single_distinct_to_group_by rule #8123

@haohuaijin

Description

@haohuaijin

Is your feature request related to a problem or challenge?

In current single_distinct_to_group_by rule, we only support all aggregate function are distinct aggregate.
But if the no distinct aggregate is count/min/max/sum, we can also do same transform like in single_distinct_to_group_by
before

select a, count(distinct b), count(c)
from t
group by a

after

select a, count(alias1), sum(alias2)
from (
  select a, b as alias1, count(c) as alias2
  from t
  group by a, b
)
group by a

Describe the solution you'd like

By write, we can improve the perfmance of distinct aggregate

❯ SELECT "RegionID", SUM("AdvEngineID"), COUNT(DISTINCT "UserID") FROM '../benchmarks/data/hits.parquet' GROUP BY "RegionID" order by "RegionID" LIMIT 10;
+----------+--------------------------------------------------+--------------------------------------------------------+
| RegionID | SUM(../benchmarks/data/hits.parquet.AdvEngineID) | COUNT(DISTINCT ../benchmarks/data/hits.parquet.UserID) |
+----------+--------------------------------------------------+--------------------------------------------------------+
| 0        | 0                                                | 8                                                      |
| 1        | 147946                                           | 239380                                                 |
| 2        | 441662                                           | 1081016                                                |
| 3        | 39724                                            | 131195                                                 |
| 4        | 34557                                            | 79500                                                  |
| 5        | 13502                                            | 40914                                                  |
| 6        | 24338                                            | 55768                                                  |
| 7        | 28417                                            | 64989                                                  |
| 8        | 34483                                            | 65472                                                  |
| 9        | 38047                                            | 91576                                                  |
+----------+--------------------------------------------------+--------------------------------------------------------+
10 rows in set. Query took 1.357 seconds.

❯ SELECT "RegionID", SUM(t1),  count("UserID") from (select "UserID", "RegionID", sum("AdvEngineID") as t1 from '../benchmarks/data/hits.parquet' group by "UserID", "RegionID") group by "RegionID" order by "RegionID" limit 10;
+----------+---------+-----------------------------------------------+
| RegionID | SUM(t1) | COUNT(../benchmarks/data/hits.parquet.UserID) |
+----------+---------+-----------------------------------------------+
| 0        | 0       | 8                                             |
| 1        | 147946  | 239380                                        |
| 2        | 441662  | 1081016                                       |
| 3        | 39724   | 131195                                        |
| 4        | 34557   | 79500                                         |
| 5        | 13502   | 40914                                         |
| 6        | 24338   | 55768                                         |
| 7        | 28417   | 64989                                         |
| 8        | 34483   | 65472                                         |
| 9        | 38047   | 91576                                         |
+----------+---------+-----------------------------------------------+
10 rows in set. Query took 0.919 seconds.

Describe alternatives you've considered

No response

Additional context

https://www.querifylabs.com/blog/distinct-aggregation-optimization-in-apache-calcite-and-trino
https://github.com/apache/calcite/blob/96b05ee12f936ed057265072ff6a2de8ea0a249e/core/src/main/java/org/apache/calcite/rel/rules/AggregateExpandDistinctAggregatesRule.java#L286-L298

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions