Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional refactoring of Null Semantics #19168

Merged
merged 1 commit into from
Jan 8, 2020
Merged

Conversation

maumar
Copy link
Contributor

@maumar maumar commented Dec 5, 2019

  • moving NullSemantics visitor after 2nd level cache - we need to know the parameter values to properly handle IN expressions wrt null semantics,
  • NullSemantics visitor needs to go before SqlExpressionOptimizer and SearchCondition, so those two are also moved after 2nd level cache,
  • moving optimizations that depend on knowing the nullability to NullSemantics visitor - optimizer now only contains optimizations that also work in 3-value logic, or when we know nulls can't happen,
  • merging InExpressionValuesExpandingExpressionVisitor int NullSemantics visitor, so that we don't apply the rewrite for UseRelationalNulls,
  • preventing NulSemantics from performing double visitation when computing non-nullable columns.

Resolves #11464
Resolves #15722
Resolves #18338
Resolves #18597
Resolves #18689
Resolves #19019

@maumar maumar requested review from roji and smitpatel December 5, 2019 03:38
@maumar
Copy link
Contributor Author

maumar commented Dec 5, 2019

@roji

@maumar maumar force-pushed the null_semantics_woes_pr branch 7 times, most recently from 408b400 to 64be016 Compare December 5, 2019 20:54
Copy link
Member

@roji roji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a first review, looking only at NullSemanticsRewritingExpressionVisitor (unfortunately I have other stuff I have to do this weekend 😢 ). I'll come back to this in a later review though.

Some notes on non-nullable column tracking:

  1. When I first looked at it, I thought we can calculate non-nullable columns as part of normal visitation, and bubble that up, instead of recursively calling FindNonNullableColumns (which means double visitation). But then I thought of another thing.
  2. The current code in VisitSqlBinary gets nullable columns out of the left side, and then visits the right side (so we carry over nullability info from left to right). However, we can also carry info from right to left just as well. In other words, just as we can avoid null compensation on the RHS of: col1 != null && col1 == col2, we can do the same for the LHS of: col1 == col2 && col1 != null. That means we can extract nullable columns from the left side before visiting the right side. It also means the visitation of each side depends on the prior visitation of the other side.
  3. To do this in an efficient way, we could:
    1. Visit left and get the non-nullable columns out of there.
    2. Visit right, optimizing with the non-nullable columns from left.
    3. If the right side yielded any new non-nullable columns, we can revisit the left to apply those.
  4. This would mean we do double visitation only when really necessary. We could even improve on this and have the left yield which columns were seen when visiting it, so that after the right yields its non-nullable columns we can visit only if there's an intersection. But that may be over-fancy.

Also, several cases of optimizations that can only be performend in 2-value logic, which we only do on non-nullable values (OptimizeNonNullableNotExpression, last two optimizations in OptimizeComparison). Should we also run these after null semantics has been applied? If so, that would mean we run them twice - once before null semantics (for non-nullables), once after - we should look into not doing that.

@smitpatel
Copy link
Member

we can also carry info from right to left just as well.

That may not be correct. The logic of carrying over non-nullability of left to right relies on short-circuiting behavior of && operator. logical operators does not short-circuit from right side.

@roji
Copy link
Member

roji commented Dec 8, 2019

we can also carry info from right to left just as well.

That may not be correct. The logic of carrying over non-nullability of left to right relies on short-circuiting behavior of && operator. logical operators does not short-circuit from right side.

From my understanding SQL doesn't guarantee any particular evaluation order: PG docs on this, some SQL Server links (I think we discussed this briefly in Redmond). Unless I'm mistaken the query planner is free to evaluate the right-hand side first if it thinks it's cheaper etc.

Also, I'm not 100% sure, but I'm not sure this matters for conjunctions. If one side requires that a column be non-null, then the conjunction can only be true if that is the case, so it shouldn't matter whether the other side contains null-compensating constructs or not... And so we should be able to omit them.

@smitpatel
Copy link
Member

While, I was not aware of SQL doing differently, what I wrote pertains to C# short-circuiting behavior. (&& operator and not AND). The likely way to have short-circuiting arising in SQL tree is going to be from C# behavior (and in most cases user written). So we are capturing most cases where it could be useful right now.

We could check right side also and try to apply left side though first thing would be we need to evaluate if it is going to be correct in all cases. Next, we need to evaluate if cost of doing such processing is worth the perf penalty when C# code would never generate it (hence falls down to translation pipeline generate such code) and translation pipeline can translate it in C# way of short-circuiting. Worst case, it won't be optimized and would be longer SQL.

Even if it is correct, given perf cost, we should not implement it unless we have a specific user reported case where it would be required and it is not arising before issue in pipeline somewhere.

Feel free to file an issue if you feel strongly about it.

@roji
Copy link
Member

roji commented Dec 9, 2019

@smitpatel as long as we're implementing this new mechanism (nonNullableColumns), I don't see why we shouldn't just do it right off the bat... we implement many things not explicitly requested by users because we know they're right.

I agree that user-written code is more likely to assume left-to-right short-circuiting behavior (as in C#). However, we also apply various transformations that could result in right-to-left optimization opportunities, making this worthwhile IMHO. For example, consider something like this:

customers.Where(c => c.Name != x && c.Name > 3)

This seems like pretty natural C# code. However, the right-hand side can only be true if c.Name is non-nullable; this could be carried over to the left side, obviating null compensation there.

Maybe before deferring this to another issue we should see what @maumar thinks?

@smitpatel
Copy link
Member

nonNullable columns is not new thing, it was already there before too.

Feel free to file an issue if you feel strongly about it.

@maumar
Copy link
Contributor Author

maumar commented Jan 7, 2020

PR feedback addressed @smitpatel

@maumar maumar force-pushed the null_semantics_woes_pr branch 3 times, most recently from 3cf8e24 to 479248a Compare January 8, 2020 02:48
@maumar
Copy link
Contributor Author

maumar commented Jan 8, 2020

@smitpatel another update

@maumar maumar force-pushed the null_semantics_woes_pr branch 2 times, most recently from 2b332df to 92b3e34 Compare January 8, 2020 21:37
- moving NullSemantics visitor after 2nd level cache - we need to know the parameter values to properly handle IN expressions wrt null semantics,
- NullSemantics visitor needs to go before SqlExpressionOptimizer and SearchCondition, so those two are also moved after 2nd level cache,
- combining NullSemantics with SqlExpressionOptimizer (kept SqlExpressionOptimizer name) - SearchCondition now applies some relevant optimization itself, so that we don't need to run full optimizer afterwards,
- merging InExpressionValuesExpandingExpressionVisitor into SqlExpressionOptimizer as well, so that we don't apply the rewrite for UseRelationalNulls,
- preventing NulSemantics from performing double visitation when computing non-nullable columns.

Resolves #11464
Resolves #15722
Resolves #18338
Resolves #18597
Resolves #18689
Resolves #19019
Copy link
Member

@smitpatel smitpatel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Congratulations!

@maumar maumar merged commit a5f3e26 into master Jan 8, 2020
@smitpatel smitpatel deleted the null_semantics_woes_pr branch February 6, 2020 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment