Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std.in operator does not support expression on left-hand-side #4497

Closed
2 tasks done
kgutwin opened this issue May 21, 2024 · 2 comments · Fixed by #4498
Closed
2 tasks done

std.in operator does not support expression on left-hand-side #4497

kgutwin opened this issue May 21, 2024 · 2 comments · Fixed by #4498
Labels
bug Invalid compiler output or panic

Comments

@kgutwin
Copy link
Collaborator

kgutwin commented May 21, 2024

What happened?

I'm trying to make a generic "boolean" conversion function, and this is where I am starting:

let coalesce_bool = func
  val <scalar || null>
  -> case [
    val == null => null,
    val == 0 => false,
    val == 1 => true,
    (val | in ["no", "n", "N", "false"]) => false,
    (val | in ["yes", "y", "Y", "true"]) => true,
  ]

This is working well, for example in the following:

from [{foo = 1}, {foo = null}]
derive { foo_b = coalesce_bool foo }

However, feeding a numeric value other than 0 or 1 into the function is causing DuckDB to raise a Conversion Error ("Could not convert string 'no' to INT32). This is because DuckDB gets to the corresponding line in the CASE statement

    WHEN foo IN ('no', 'n', 'N', 'false') THEN false

and tries to cast the strings in the array to numbers.

The solution for me seemed simple -- to cast val to VARCHAR to do the comparison by strings:

let coalesce_bool = func
  val <scalar || null>
  -> case [
    val == null => null,
    val == 0 => false,
    val == 1 => true,
    (val | as VARCHAR | in ["no", "n", "N", "false"]) => false,
    (val | as VARCHAR | in ["yes", "y", "Y", "true"]) => true,
  ]

but this causes a PRQL compiler panic:

The application panicked (crashed).
Message:  args to `std.array_in` must be a column ref and an array
Location: prqlc/prqlc/src/sql/gen_expr.rs:176

  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ BACKTRACE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
                                ⋮ 13 frames hidden ⋮                              
  14: prqlc::sql::gen_expr::process_array_in::h96ab7dc44c311876
      at /Users/kgutwin/github/prql/prqlc/prqlc/src/sql/gen_expr.rs:176
  15: prqlc::sql::gen_expr::translate_expr::h392231bd9a27822a
      at /Users/kgutwin/github/prql/prqlc/prqlc/src/sql/gen_expr.rs:106
[ ... snip ... ]

I looked at prqlc/prqlc/src/sql/gen_expr.rs and it's not obvious to me why there's a panic there, especially as the other process_*() functions seem to handle left-hand-side expressions easily. Would it be possible to remove this panic in order to support more complex expressions for the std.in function? Or is there an underlying reason why the IN operator needs to have a column reference only on the left-hand side?

PRQL input

from [{ a = 1 }]
derive { b = ( a+1 | in [2, 3] ) }

SQL output

n/a (compiler crashes)

Expected SQL output

WITH table_0 AS (
  SELECT
    1 AS a
)
SELECT
  a,
  a + 1 IN (2, 3) AS b
FROM
  table_0

MVCE confirmation

  • Minimal example
  • New issue

Anything else?

No response

@kgutwin kgutwin added the bug Invalid compiler output or panic label May 21, 2024
@max-sixty
Copy link
Member

max-sixty commented May 21, 2024

Good example and it's encouraging to see you pushing functions like this!

There are definitely a few things up — it obv shouldn't panic there.

Also putting an s-string within the pipeline panics :(

let coalesce_bool = func
  val <scalar || null>
  -> case [
    (s"{val}" | in ["no", "n", "N", "false"]) => false,
  ]

from [{foo = 1}, {foo = null}]
derive { foo_b = coalesce_bool foo }

For completeness — if I put this in the playground, duckdb doesn't raise an error — does it on your end? Is the error message because of the type of some data that's not there?

let coalesce_bool = func
  val <scalar || null>
  -> case [
    val == null => null,
    val == 0 => false,
    val == 1 => true,
    (val | in ["no", "n", "N", "false"]) => false,
    (val | in ["yes", "y", "Y", "true"]) => true,
  ]

from [{foo = 1}, {foo = null}]
derive { foo_b = coalesce_bool foo }

For a workaround — does an s-string in another transform suffice for the moment?

from [{ a = 1 }]
derive a_ = (s"{a} as VARCHAR")
derive { b = ( a_ | in [2, 3] ) }

@kgutwin
Copy link
Collaborator Author

kgutwin commented May 21, 2024

I have to run now, but quickly -- I can provide some more details later on how I've managed to trigger the underlying situation, but unfortunately an s-string in another transform isn't a great solution because of the way this function is used in other parts of our PRQL.

I'm hoping that the PR that I threw together is a simple solution; it does fix the s-string panic as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Invalid compiler output or panic
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants