Skip to content

Parquet-java sometimes produces 0-size compressed data in data page v2 #3122

@mapleFU

Description

@mapleFU

Describe the bug

See:

  1. GH-31992: [C++][Parquet] Handling the special case when DataPageV2 values buffer is empty arrow#45252 (comment)
  2. File: https://github.com/user-attachments/files/18292070/snappy_bug.parquet.gz

V2 data pages do not compress def/rep levels, only the values. In the (not uncommon) case where all data is null, Parquet-java may write 0 compressed bytes for a 0-size decompressed data. However, a 0-size compressed buffer is not a valid input for compressors. The data page then fails decompressing in the C++ and Rust implementations.

Component(s)

Core

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions