This is my plan this week for reviews, etc. I am putting it here to make it visible and keep myself organized - [x] Parquet Filter Pushdown Performance: Complete ClickBench benchmark: https://github.com/apache/arrow-rs/pull/7470 - [x] Parquet Filter Pushdown Performance: Refactor selection logic into its own structure: https://github.com/apache/arrow-rs/pull/7502 - [x] Parquet Filter Pushdown Performance: create POC to save filtered results: https://github.com/apache/arrow-rs/pull/7513 - [x] Parquet Filter Pushdown Performance: review additional code from ClickBench derived benchmarks with @zhuqi-lucas - [x] DataFusion performance: review per-file pruning with @adriangb https://github.com/apache/datafusion/pull/16014 - [ ] DataFusion performance: review blocked aggregate PR from @Rachelint : https://github.com/apache/datafusion/pull/15591#issuecomment-2886570394 - [ ] DataFusion performance: Projection Pushdown: review suggestion https://github.com/apache/datafusion/issues/14993#issuecomment-2880370941 from @adragomir - [x] Parquet Variant: Create `parquet-variant` create skeleton PR : https://github.com/apache/arrow-rs/pull/7485 - [x] Parquet Variant: review @PinkCrow007 's https://github.com/apache/arrow-rs/pull/7452 from @PinkCrow007 - [ ] Parquet Variant: get variant encoder/decoder into `parquet-variant` crate with @PinkCrow007 - [ ] Parquet Variant: Try and fix variant example files with @mapleFU https://github.com/apache/parquet-testing/issues/82 - [x] DataFusion: Metadata Handling / extension types review @timsaucer: https://github.com/apache/datafusion/pull/15911 - [ ] DataFusion Feature: Update example of using multiple threadpools with object store https://github.com/apache/datafusion/pull/14286#discussion_r2086779525 - [ ] DataFusion Feature: async user defined functions with @goldmedal https://github.com/apache/datafusion/pull/14837 - [ ] Arrow Bug with concat'ing dictionaries from @davidhewitt: https://github.com/apache/arrow-rs/pull/7468 - [ ] DataFusion perf script from @logan-keede : https://github.com/apache/datafusion/pull/15144 - [ ] DataFusion perf script draft: https://github.com/apache/datafusion/pull/15846 - [ ] DataFusion PR about pruning ordering: https://github.com/apache/datafusion/pull/15821 - [ ] DataFusion Sorting: https://github.com/apache/datafusion/pull/15727 - [ ] Arrow Dictionary ID next steps from @brancz: https://github.com/apache/arrow-rs/pull/7467 - [ ] Arrow Variant: Expose tape decoder https://github.com/apache/arrow-rs/pull/7442 Nice to have (really would be great to have someone help review): - [ ] DataFusion: Aggregate UDFs in FFI: https://github.com/apache/datafusion/pull/14775 - [ ] Arrow: Avro cleanup: https://github.com/apache/arrow-rs/pull/6965 - [ ] Arrow: Avro Utf8View: https://github.com/apache/arrow-rs/pull/7434
This is my plan this week for reviews, etc. I am putting it here to make it visible and keep myself organized
arrow_reader_clickbenchbenchmark arrow-rs#7470ReadPlanto encapsulate the calculation of what parquet rows to decode arrow-rs#7502TableProviders#14993 (comment) from @adragomirparquet-variantcreate skeleton PR : [Variant] Add (empty)parquet-variantcrate, updateparquet-testingpin arrow-rs#7485parquet-variantcrate with @PinkCrow007Nice to have (really would be great to have someone help review):