Skip to content

Inconsistent Signedness Of Legacy Parquet Timestamps Written By Spark #7958

@comphead

Description

@comphead

Describe the bug

DF reads parquet timestamp datatype as nanos from parquet file whereas DuckDb and Spark treats timestamp datatype as seconds

To Reproduce

create a parquet file with timestamp value -62125747200 and read it back

DuckDb or Spark reads the value correctly

0001-04-25 00:00:00

but DF reads timestamps as nanos and provides the wrong answer

❯ select * from test;
+-------------------------------+
| a                             |
+-------------------------------+
| 1754-12-22T22:43:41.128654848 |
+-------------------------------+

Expected behavior

Behavior should be the same

Additional context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions