Skip to content

regression: inlist deserialization error #17225

@haohuaijin

Description

@haohuaijin

Describe the bug

encounter deserialization error when query have inlist and other filter like below

Error: Internal("PhysicalExpr Column references column 'p_size' at index 1 (zero-based) but input schema only has 1 columns: [\"p_size\"]")

the query is

SELECT p_size FROM part WHERE p_size IN (14, 6, 5, 31) and p_partkey > 1000

To Reproduce

add a reproduce in pr https://github.com/apache/datafusion/pull/17224/files

this is another reproduce https://github.com/haohuaijin/inlist-reproduce
The code is as follows

use std::sync::Arc;

use arrow::datatypes::{DataType, Field, Schema, SchemaRef};
use datafusion::{
    datasource::{
        file_format::parquet::ParquetFormat,
        listing::{ListingOptions, ListingTableUrl},
    },
    prelude::SessionContext,
};
use datafusion_proto::{
    physical_plan::{AsExecutionPlan, DefaultPhysicalExtensionCodec},
    protobuf::PhysicalPlanNode,
};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let ctx = SessionContext::new();
    let listing_options = ListingOptions::new(Arc::new(ParquetFormat::default()));
    let table_path = ListingTableUrl::parse("data.parquet")?;

    ctx.register_listing_table(
        "default",
        &table_path,
        listing_options.clone(),
        Some(get_schema()),
        None,
    )
    .await?;

    let plan = ctx
        .sql("select message from default where message in ('a', 'b', 'c', 'd') and timestamp >= 1")
        .await
        .unwrap()
        .create_physical_plan()
        .await
        .unwrap();

    let node: PhysicalPlanNode =
        PhysicalPlanNode::try_from_physical_plan(plan, &DefaultPhysicalExtensionCodec {}).unwrap();

    let plan = node
        .try_into_physical_plan(&ctx, &ctx.runtime_env(), &DefaultPhysicalExtensionCodec {})
        .unwrap();

    println!("{:?}", plan);

    Ok(())
}

fn get_schema() -> SchemaRef {
    SchemaRef::new(Schema::new(vec![
        Field::new("timestamp", DataType::Int64, false),
        Field::new("message", DataType::Utf8, true),
    ]))
}

Expected behavior

deserialization success

Additional context

it work fine in datafusion v47 and v48
look like related #16665 and #16744

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingregressionSomething that used to work no longer does

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions