Skip to content

Problem with a cat #2836

@asfimport

Description

@asfimport

$ parquet cat train-00000-of-00001-15a05aeec7726f9d.parquet                        

Unknown error

shaded.parquet.org.apache.avro.SchemaParseException: Illegal character in: original-instruction

at shaded.parquet.org.apache.avro.Schema.validateName(Schema.java:1607)

at shaded.parquet.org.apache.avro.Schema.access$400(Schema.java:92)

at shaded.parquet.org.apache.avro.Schema$Field.(Schema.java:556)

at shaded.parquet.org.apache.avro.Schema$Field.(Schema.java:595)

at org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:295)

at org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:279)

at org.apache.parquet.cli.util.Schemas.fromParquet(Schemas.java:89)

at org.apache.parquet.cli.BaseCommand.getAvroSchema(BaseCommand.java:405)

at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:66)

at org.apache.parquet.cli.Main.run(Main.java:163)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)

at org.apache.parquet.cli.Main.main(Main.java:193)

the data set in question is: https://huggingface.co/datasets/argilla/databricks-dolly-15k-curated-en/tree/main/data

Reporter: Rémy Léone / @remyleone

Original Issue Attachments:

Note: This issue was originally created as PARQUET-2378. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions