Skip to content

Composite Join Condition Produces Incorrect SQL. #1485

@SP-John

Description

@SP-John

Wren Engine Bug: Composite Join Condition Produces Incorrect SQL

Describe the bug

When a relationship condition uses a composite key (two or more column pairs joined with AND), the Wren Engine's query planner generates an incorrect ON clause in the expanded SQL. Both sides of the join condition are resolved to the same table alias, producing a self-referencing condition like ON model_a.col1 = model_a.col2 instead of the correct cross-table ON model_a.col1 = model_b.col1 AND model_a.col2 = model_b.col2.

Single-column join conditions work correctly. The bug is triggered specifically by the AND-separated composite condition.

To Reproduce

  1. Create orders and customers tables in PostgreSQL with a composite key (tenant_id + custkey):
CREATE TABLE customers (
    tenant_id VARCHAR(50) NOT NULL,
    custkey INTEGER NOT NULL,
    name VARCHAR(100) NOT NULL,
    PRIMARY KEY (tenant_id, custkey)
);

CREATE TABLE orders (
    tenant_id VARCHAR(50) NOT NULL,
    orderkey INTEGER NOT NULL,
    custkey INTEGER NOT NULL,
    price NUMERIC(10,2),
    PRIMARY KEY (tenant_id, orderkey),
    FOREIGN KEY (tenant_id, custkey) REFERENCES customers(tenant_id, custkey)
);

INSERT INTO customers (tenant_id, custkey, name) VALUES
    ('ACME', 1, 'Alice Johnson'), ('ACME', 2, 'Bob Smith'),
    ('GLOBEX', 1, 'Carol Williams'), ('GLOBEX', 2, 'Dave Brown');

INSERT INTO orders (tenant_id, orderkey, custkey, price) VALUES
    ('ACME', 101, 1, 250.00), ('ACME', 102, 2, 175.50),
    ('GLOBEX', 201, 1, 320.00), ('GLOBEX', 202, 2, 89.99);
  1. Create this MDL with a composite join condition (tenant_id + custkey):
{
  "catalog": "wren_poc",
  "schema": "public",
  "models": [
    {
      "name": "orders",
      "tableReference": { "catalog": "wren_poc", "schema": "public", "table": "orders" },
      "primaryKey": "orderkey",
      "columns": [
        { "name": "tenant_id", "type": "VARCHAR" },
        { "name": "orderkey", "type": "INTEGER" },
        { "name": "custkey", "type": "INTEGER" },
        { "name": "price", "type": "FLOAT" },
        { "name": "customers", "type": "customers", "relationship": "orders_customers" },
        {
          "name": "customer_name",
          "type": "VARCHAR",
          "isCalculated": true,
          "expression": "customers.name"
        }
      ]
    },
    {
      "name": "customers",
      "tableReference": { "catalog": "wren_poc", "schema": "public", "table": "customers" },
      "primaryKey": "custkey",
      "columns": [
        { "name": "tenant_id", "type": "VARCHAR" },
        { "name": "custkey", "type": "INTEGER" },
        { "name": "name", "type": "VARCHAR" }
      ]
    }
  ],
  "relationships": [
    {
      "name": "orders_customers",
      "models": ["orders", "customers"],
      "joinType": "MANY_TO_ONE",
      "condition": "orders.tenant_id = customers.tenant_id AND orders.custkey = customers.custkey"
    }
  ]
}
  1. Query the calculated field:
SELECT orderkey, customer_name, price FROM orders LIMIT 5

Actual behavior

The Wren Engine generates incorrect SQL where the ON clause references the same table on both sides:

-- Simplified from actual dry-plan output:
... RIGHT OUTER JOIN (...) AS orders
ON orders.tenant_id = orders.custkey   -- BUG: should be customers.tenant_id = orders.tenant_id
                                        --      AND customers.custkey = orders.custkey

The condition orders.tenant_id = orders.custkey compares two different columns on the same table (tenant_id vs custkey), which is semantically wrong and returns 0 rows.

Expected behavior

The expanded SQL should produce a correct cross-table join:

... RIGHT OUTER JOIN (...) AS orders
ON customers.tenant_id = orders.tenant_id
AND customers.custkey = orders.custkey

Workaround

Use a single-column join condition. If both tables have RLAC rules that filter by tenant_id, the tenant isolation is already enforced independently on each table — the join only needs the business key:

{
  "name": "orders_customers",
  "models": ["orders", "customers"],
  "joinType": "MANY_TO_ONE",
  "condition": "orders.custkey = customers.custkey"
}

This produces correct SQL:

ON customers.custkey = orders.custkey

Analysis

The bug appears to be in how the query planner resolves the composite condition string. With a single-column condition (orders.custkey = customers.custkey), the planner correctly maps each side to the appropriate table alias. With a composite condition containing AND, the planner appears to collapse or mismap the table references.

This was discovered in a multi-tenant healthcare analytics system where models use composite keys (client + personuuid) for tenant-isolated joins. The RLAC rule client = @client_id already enforces tenant isolation on each table independently, so the workaround (single-column join on personuuid only) is functionally correct. However, the composite join should work as documented.

Environment

  • Wren Engine fork based on ghcr.io/canner/wren-engine-ibis:latest
  • PostgreSQL 15 (Docker)
  • Linux (Oracle Linux 8, x86_64)
  • Tested via ibis-server v3 /dry-plan and /query endpoints

Relationship to previous bug report

This is a separate issue from the "relationship handle column name must match model name" bug documented in wren-engine-bug-report.md. That bug was about expression resolution; this bug is about composite join condition expansion in the query planner.

Documentation gap

The Advanced Tutorial only shows single-column join conditions (Orders.custkey = Customer.custkey). No examples of composite/multi-column join conditions exist in the documentation. It's unclear whether composite conditions are a supported feature or an undocumented edge case.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions