Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query: Improve extensibility of null semantics processor #20854

Merged
merged 1 commit into from
May 7, 2020

Conversation

smitpatel
Copy link
Member

@smitpatel smitpatel commented May 6, 2020

Resolves #20204

For table sources, there is no concept of nullability

  • Visit(TableExpressionBase) which works as dispatcher.
  • Visit(SelectExpression) as a special case to avoid casting and allow easy overriding.
  • No additional methods for individual table sources.

For SQL expressions,

  • Visit(SqlExpression, out bool nullable) which defaults allowOptimizedExpansion to false.
  • Visit(SqlExpression, allowOptimizedExpansion, out bool nullable) which works as dispatcher.
  • Individual Visit* methods to change bahavior of any particular SqlExpression.

Notes:

  • Each individual SQLExpression processes it's children with appropriately set allowOptimizedExpansion flag. It collects nullable flag for all children and return nullable for itself through out parameter.
  • NonNullableColumns are being used only in 2 places and both usages are different. Rather than crystal balling an API to expose it, we should just keep it private till a provider asks for it then we can discuss with provider writer what information is needed and how would they like it. Making it private does not cause any bug, may just miss an optimization in SQL.
  • Added AddNonNullableColumn API for providers to add aditional column to the list.
  • The processor does not derive from ExpressionVisitor anymore. It has ExpressionVisitor like dispatch system and API.
  • Since it is not deriving from SqlExpressionVisitor there is no longer abstract method to force implementing for new SqlExpression type. Hence unknown expression type throws exception. Making it pass through could cause incorrect result as we don't visit custom expression's components.
  • Added VisitCustomSqlExpression method to allow providers to add logic for custom functions without having to deal with non-nullable column reset.
  • Added DoNotCache method rather than exposing _canCache.
  • Changed API about returning tuple for caching to an out parameter.

@smitpatel smitpatel requested a review from maumar May 6, 2020 08:12
Copy link
Member

@roji roji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty great to me!

Re nullable columns... If I'm getting the design right, the only meaningful read of _nonNullableColumns is by ProcessColumn, or in other words, a provider will never need to read that list directly (they will do that via ProcessColumn). So the only thing missing is if a provider wants to add to that list, i.e. because there's a provider-specific construct that results in knowing that a column isn't null. PostgreSQL has some operators like this:

  • IS TRUE / IS FALSE: these could be useful translations. If these return true on a column, it's non-nullable. These could be nice as a translation for ternary.
  • IS DISTINCT FROM: like equality but works like C# (never returns null). Could be a replacement for equality checks in many cases, but there are some rumors on it being bad for perf.

I don't have any immediate plans to do these, but on the other hand, is there any downside to just exposing AddNonNullableColumn? I can't imagine needing to snapshot and reset the list like we do in ProcessSqlBinary though (for and/or).

Aside from this, I made some comments which are unrelated to this specific PR (relevant before), we can handle them later too.

@smitpatel
Copy link
Member Author

Please explain in detail, how do you plan to use it.

@roji
Copy link
Member

roji commented May 6, 2020

IS TRUE/FALSE could be a bit useful for checks on nullable bool columns (x == true -> X IS TRUE), and then carrying over non-nullability in case the column is checked again (a bit far-fetched).

IS DISTINCT FROM could be a good translation for C# non-equality, and IS NOT DISTINCT FROM for C# equality. If I actually end up doing that, both could pass up the information that the column is non-nullable. But I'd have to investigate perf aspects thoroughly before thinking about this.

Makes sense? Again, nothing urgent/blocking, but why not enable it.

@smitpatel
Copy link
Member Author

No, it did not answer my question. I am asking you how are you going to use it in code, when and what method is called. I am not going to crystal ball an API here.

@roji
Copy link
Member

roji commented May 6, 2020

Very simple: we'd add an AddNonNullableColumn protected method that accepts a column and adds it to _nonNullableColumns - basically the same as what we do for IS NULL or the relevant comparisons already do.

@smitpatel
Copy link
Member Author

I am not asking about what I need to add in API here. I am asking about your code where you want to use it.

@roji
Copy link
Member

roji commented May 6, 2020

Sorry, I'm not understanding your question. To support custom expressions, I'd presumably define my own Process method (e.g. ProcessDistinctFrom). That method may call AddNonNullableColumn if it decides to do it, just like ProcessSqlBinary does in your PR. If that's still not what you're asking, please clarify...

@smitpatel
Copy link
Member Author

Just file a new issue on what is blocking you in the provider code.

Resolves #20204

For table sources, there is no concept of nullability
- `Visit(TableExpressionBase)` which works as dispatcher.
- `Visit(SelectExpression)` as a special case to avoid casting and allow easy overriding.
- No additional methods for individual table sources.

For SQL expressions,
- `Visit(SqlExpression, out bool nullable)` which defaults allowOptimizedExpansion to false.
- `Visit(SqlExpression, allowOptimizedExpansion, out bool nullable)` which works as dispatcher.
- Individual Visit* methods to change bahavior of any particular SqlExpression.

Notes:
- Each individual SQLExpression processes it's children with appropriately set allowOptimizedExpansion flag. It collects nullable flag for all children and return nullable for itself through out parameter.
- NonNullableColumns are being used only in 2 places and both usages are different. Rather than crystal balling an API to expose it, we should just keep it private till a provider asks for it then we can discuss with provider writer what information is needed and how would they like it. Making it private does not cause any bug, may just miss an optimization in SQL.
- Added AddNonNullableColumn API for providers to add aditional column to the list.
- The processor does not derive from ExpressionVisitor anymore. It has ExpressionVisitor like dispatch system and API.
- Since it is not deriving from SqlExpressionVisitor there is no longer abstract method to force implementing for new SqlExpression type. Hence unknown expression type throws exception. Making it pass through could cause incorrect result as we don't visit custom expression's components.
- Added `VisitCustomSqlExpression` method to allow providers to add logic for custom functions without having to deal with non-nullable column reset.
- Added DoNotCache method rather than exposing _canCache.
- Changed API about returning tuple for caching to an out parameter.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Query: revisit extensibility of null semantics visitor
3 participants