[EPIC] Unify Function Interface (remove `BuiltInScalarFunction`)

### Is your feature request related to a problem or challenge?

This is based on the wonderful writeup from @2010YOUY01  in https://github.com/apache/arrow-datafusion/issues/7977

As previously discussed in https://github.com/apache/arrow-datafusion/issues/7110 https://github.com/apache/arrow-datafusion/pull/7752  there are a few challenges with how ScalarFunctions are handled, notably that there are two distinct implementations -- `BuiltinScalarFunction` and `ScalarUDF`

#### Problems with `BuiltinScalarFunction`

1. As more functions are added, the total footprint of DataFusion grows, even for those who don't need the specific functions. This also acts to limit the number of functions built into DataFusion
2. The desired semantics may be different for different users(e.g. many of the built in functions in DataFusion mirror postgres behavior, but some users wish to mimic spark behavior)
3. User defined functions are treated differently from built in functions in some ways (e.g. they can't have aliases)
4. Adding a new built in function requires modifications in multiple places which makes the barrier overly high.Built-in functions are implemented with `Enum BuiltinScalarFunction`, and function implementations like `return_type()` are large methods that match every enum variant. 

#### Problems with `ScalarUDF`
* The current implementation of `ScalarUDF`s is a struct, and does not cover all the functionalities of existing built-in functions
* Defining a new `ScalarUDF` requires constructing a struct in an imperative way providing `Arc` function pointers (see examples/simple_udf.rs) for each part of the UDF, which is not familiar to Rust users where it is more common to see `dyn Trait` objects


### Describe the solution you'd like

I propose moving DataFusion to **only** use `ScalarUDF`s and remove `BuiltInScalarFunction`. This will ensure:

1. ScalarUDFs have access to all the same functionality as "built in " functions. 
2. No function specific code will escape the planning phase
3. DataFusion's core can remain focused, and external libraries of packages can be used to customize its use. 

We will keep the existing `ScalarUDF` interface as much as possible, while also potentially providing an easier way to define them (ideally via a trait object). 

### Describe alternatives you've considered

https://github.com/apache/arrow-datafusion/issues/7977 describes introducing a new trait and unifying both ScalarUDF and BuiltInScalarFunction with this trait. 

This approach also allows gradually migrating existing built-in functions to the new one, the old UDF interface `create_udf()` can keep unchanged.

However, I think it is a bigger change for users, and has the danger of making the overall complexity of DataFusion worse. As demonstrated in https://github.com/apache/arrow-datafusion/pull/8046 it is also feasible to allow new `ScalarUDF`s to be defined using a trait while retaining backwards compatibility for existing `ScalarUDF` implementations

### Additional context

Proposed implementation steps:

- [x] Prototype ScalarUDF interface changes (make the fields non `pub`): https://github.com/apache/arrow-datafusion/pull/8039
- [x] Prototype how registering external packages would look like (by making a prototype for some BuildInFunctions): https://github.com/apache/arrow-datafusion/pull/8046
- [x] Propose `ScalarUDF` API changes for real: https://github.com/apache/arrow-datafusion/pull/8079
- [x] https://github.com/apache/arrow-datafusion/pull/8059
- [x] List additional feature gaps between built in functions and ScalarUDfs and close them
- [x] https://github.com/apache/arrow-datafusion/pull/8114
- [x] https://github.com/apache/arrow-datafusion/issues/8346
- [x] https://github.com/apache/arrow-datafusion/issues/8347
- [x] https://github.com/apache/arrow-datafusion/issues/8756
- [x] https://github.com/apache/arrow-datafusion/issues/8157
- [x] https://github.com/apache/arrow-datafusion/issues/9392
- [ ] https://github.com/apache/arrow-datafusion/issues/8051
- [x] https://github.com/apache/arrow-datafusion/issues/8712
- [x] https://github.com/apache/arrow-datafusion/pull/8088
- [x] Implement aliasing for ScalarUDF: https://github.com/apache/arrow-datafusion/issues/8348
- [x] Implement trait based ScalarUDF: https://github.com/apache/arrow-datafusion/issues/8568
- [x] https://github.com/apache/arrow-datafusion/issues/9100
- [x] Create a new `datafusion-function` crate with an initial set of functions as a model (see https://github.com/apache/arrow-datafusion/pull/8046) 
- [x] Create tickets for extracting the remaining lists of packages into the `datafusion_functions` crate, file tickets to track them (https://github.com/apache/arrow-datafusion/issues/9285)
- [x] https://github.com/apache/arrow-datafusion/issues/9285
- [x] https://github.com/apache/arrow-datafusion/issues/9074
- [x] File follow on tickets for applying the same treatment to `AggregateUDF` https://github.com/apache/arrow-datafusion/issues/8708 and `WindowUDF` https://github.com/apache/arrow-datafusion/issues/8709


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EPIC] Unify Function Interface (remove `BuiltInScalarFunction`) #8045

Is your feature request related to a problem or challenge?

Problems with `BuiltinScalarFunction`

Problems with `ScalarUDF`

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[EPIC] Unify Function Interface (remove BuiltInScalarFunction) #8045

Description

Is your feature request related to a problem or challenge?

Problems with BuiltinScalarFunction

Problems with ScalarUDF

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

[EPIC] Unify Function Interface (remove `BuiltInScalarFunction`) #8045

Problems with `BuiltinScalarFunction`

Problems with `ScalarUDF`