Skip to content

[R] Filter with regular expressions #26296

@asfimport

Description

@asfimport

Hi,

Some expressions, such as substr(), grepl(), str_detect() or others, are not supported while filtering a dataset (after open_datatset() ). Specifically, the code below :

library(dplyr)
library(arrow)
data = data.frame(a = c("a", "a2", "a3"))
write_parquet(data, "Test_filter/data.parquet")
ds <- open_dataset("Test_filter/")
data_flt <- ds %>% 
 filter(substr(a, 1, 1) == "a")

gives this error :

Error: Filter expression not supported for Arrow Datasets: substr(a, 1, 1) == "a"
 Call collect() first to pull data into R.

These expressions may be very helpful, not to say necessary, to filter and collect a very large dataset. Is there anything it can be done to implement this new feature ?

Thank you.

Reporter: Pal

Related issues:

Note: This issue was originally created as ARROW-10305. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions