Additional refactoring of Null Semantics #19168

maumar · 2019-12-05T03:38:04Z

moving NullSemantics visitor after 2nd level cache - we need to know the parameter values to properly handle IN expressions wrt null semantics,
NullSemantics visitor needs to go before SqlExpressionOptimizer and SearchCondition, so those two are also moved after 2nd level cache,
moving optimizations that depend on knowing the nullability to NullSemantics visitor - optimizer now only contains optimizations that also work in 3-value logic, or when we know nulls can't happen,
merging InExpressionValuesExpandingExpressionVisitor int NullSemantics visitor, so that we don't apply the rewrite for UseRelationalNulls,
preventing NulSemantics from performing double visitation when computing non-nullable columns.

Resolves #11464
Resolves #15722
Resolves #18338
Resolves #18597
Resolves #18689
Resolves #19019

maumar · 2019-12-05T03:38:20Z

@roji

src/EFCore.Relational/Query/Internal/SqlExpressionOptimizingExpressionVisitor.cs

test/EFCore.SqlServer.FunctionalTests/Query/NorthwindMiscellaneousQuerySqlServerTest.cs

src/EFCore.Relational/Query/Internal/NullSemanticsRewritingExpressionVisitor.cs

src/EFCore.Relational/Query/RelationalParameterBasedQueryTranslationPostprocessor.cs

src/EFCore.SqlServer/Query/Internal/SqlServerParameterBasedQueryTranslationPostprocessor.cs

src/EFCore.SqlServer/Query/Internal/SqlServerQueryTranslationPostprocessor.cs

src/EFCore.Relational/Query/RelationalParameterBasedQueryTranslationPostprocessor.cs

roji

Here's a first review, looking only at NullSemanticsRewritingExpressionVisitor (unfortunately I have other stuff I have to do this weekend 😢 ). I'll come back to this in a later review though.

Some notes on non-nullable column tracking:

When I first looked at it, I thought we can calculate non-nullable columns as part of normal visitation, and bubble that up, instead of recursively calling FindNonNullableColumns (which means double visitation). But then I thought of another thing.
The current code in VisitSqlBinary gets nullable columns out of the left side, and then visits the right side (so we carry over nullability info from left to right). However, we can also carry info from right to left just as well. In other words, just as we can avoid null compensation on the RHS of: col1 != null && col1 == col2, we can do the same for the LHS of: col1 == col2 && col1 != null. That means we can extract nullable columns from the left side before visiting the right side. It also means the visitation of each side depends on the prior visitation of the other side.
To do this in an efficient way, we could:
1. Visit left and get the non-nullable columns out of there.
2. Visit right, optimizing with the non-nullable columns from left.
3. If the right side yielded any new non-nullable columns, we can revisit the left to apply those.
This would mean we do double visitation only when really necessary. We could even improve on this and have the left yield which columns were seen when visiting it, so that after the right yields its non-nullable columns we can visit only if there's an intersection. But that may be over-fancy.

Also, several cases of optimizations that can only be performend in 2-value logic, which we only do on non-nullable values (OptimizeNonNullableNotExpression, last two optimizations in OptimizeComparison). Should we also run these after null semantics has been applied? If so, that would mean we run them twice - once before null semantics (for non-nullables), once after - we should look into not doing that.

src/EFCore.Relational/Query/Internal/NullSemanticsRewritingExpressionVisitor.cs

smitpatel · 2019-12-08T21:14:17Z

we can also carry info from right to left just as well.

That may not be correct. The logic of carrying over non-nullability of left to right relies on short-circuiting behavior of && operator. logical operators does not short-circuit from right side.

roji · 2019-12-08T22:49:55Z

we can also carry info from right to left just as well.

That may not be correct. The logic of carrying over non-nullability of left to right relies on short-circuiting behavior of && operator. logical operators does not short-circuit from right side.

From my understanding SQL doesn't guarantee any particular evaluation order: PG docs on this, some SQL Server links (I think we discussed this briefly in Redmond). Unless I'm mistaken the query planner is free to evaluate the right-hand side first if it thinks it's cheaper etc.

Also, I'm not 100% sure, but I'm not sure this matters for conjunctions. If one side requires that a column be non-null, then the conjunction can only be true if that is the case, so it shouldn't matter whether the other side contains null-compensating constructs or not... And so we should be able to omit them.

smitpatel · 2019-12-09T00:25:20Z

While, I was not aware of SQL doing differently, what I wrote pertains to C# short-circuiting behavior. (&& operator and not AND). The likely way to have short-circuiting arising in SQL tree is going to be from C# behavior (and in most cases user written). So we are capturing most cases where it could be useful right now.

We could check right side also and try to apply left side though first thing would be we need to evaluate if it is going to be correct in all cases. Next, we need to evaluate if cost of doing such processing is worth the perf penalty when C# code would never generate it (hence falls down to translation pipeline generate such code) and translation pipeline can translate it in C# way of short-circuiting. Worst case, it won't be optimized and would be longer SQL.

Even if it is correct, given perf cost, we should not implement it unless we have a specific user reported case where it would be required and it is not arising before issue in pipeline somewhere.

Feel free to file an issue if you feel strongly about it.

roji · 2019-12-09T02:00:30Z

@smitpatel as long as we're implementing this new mechanism (nonNullableColumns), I don't see why we shouldn't just do it right off the bat... we implement many things not explicitly requested by users because we know they're right.

I agree that user-written code is more likely to assume left-to-right short-circuiting behavior (as in C#). However, we also apply various transformations that could result in right-to-left optimization opportunities, making this worthwhile IMHO. For example, consider something like this:

customers.Where(c => c.Name != x && c.Name > 3)

This seems like pretty natural C# code. However, the right-hand side can only be true if c.Name is non-nullable; this could be carried over to the left side, obviating null compensation there.

Maybe before deferring this to another issue we should see what @maumar thinks?

smitpatel · 2019-12-09T03:25:34Z

nonNullable columns is not new thing, it was already there before too.

Feel free to file an issue if you feel strongly about it.

maumar · 2020-01-07T00:43:50Z

PR feedback addressed @smitpatel

src/EFCore.Relational/Query/RelationalParameterBasedQueryTranslationPostprocessor.cs

src/EFCore.Relational/Query/SqlExpressionOptimizingExpressionVisitor.cs

src/EFCore.SqlServer/Query/Internal/SearchConditionConvertingExpressionVisitor.cs

src/EFCore.Relational/Query/NullabilityBasedSqlProcessingExpressionVisitor.cs

maumar · 2020-01-08T20:36:41Z

@smitpatel another update

- moving NullSemantics visitor after 2nd level cache - we need to know the parameter values to properly handle IN expressions wrt null semantics, - NullSemantics visitor needs to go before SqlExpressionOptimizer and SearchCondition, so those two are also moved after 2nd level cache, - combining NullSemantics with SqlExpressionOptimizer (kept SqlExpressionOptimizer name) - SearchCondition now applies some relevant optimization itself, so that we don't need to run full optimizer afterwards, - merging InExpressionValuesExpandingExpressionVisitor into SqlExpressionOptimizer as well, so that we don't apply the rewrite for UseRelationalNulls, - preventing NulSemantics from performing double visitation when computing non-nullable columns. Resolves #11464 Resolves #15722 Resolves #18338 Resolves #18597 Resolves #18689 Resolves #19019

smitpatel

Congratulations!

maumar requested review from roji and smitpatel December 5, 2019 03:38

maumar commented Dec 5, 2019

View reviewed changes

src/EFCore.Relational/Query/Internal/SqlExpressionOptimizingExpressionVisitor.cs Outdated Show resolved Hide resolved

maumar commented Dec 5, 2019

View reviewed changes

test/EFCore.SqlServer.FunctionalTests/Query/NorthwindMiscellaneousQuerySqlServerTest.cs Outdated Show resolved Hide resolved

maumar commented Dec 5, 2019

View reviewed changes

src/EFCore.Relational/Query/Internal/NullSemanticsRewritingExpressionVisitor.cs Outdated Show resolved Hide resolved

maumar commented Dec 5, 2019

View reviewed changes

src/EFCore.Relational/Query/Internal/NullSemanticsRewritingExpressionVisitor.cs Outdated Show resolved Hide resolved

maumar force-pushed the null_semantics_woes_pr branch from bd05643 to 42a8e1b Compare December 5, 2019 03:59

maumar commented Dec 5, 2019

View reviewed changes

src/EFCore.Relational/Query/RelationalParameterBasedQueryTranslationPostprocessor.cs Outdated Show resolved Hide resolved

maumar commented Dec 5, 2019

View reviewed changes

src/EFCore.SqlServer/Query/Internal/SqlServerParameterBasedQueryTranslationPostprocessor.cs Outdated Show resolved Hide resolved

maumar commented Dec 5, 2019

View reviewed changes

src/EFCore.SqlServer/Query/Internal/SqlServerQueryTranslationPostprocessor.cs Outdated Show resolved Hide resolved

maumar force-pushed the null_semantics_woes_pr branch 7 times, most recently from 408b400 to 64be016 Compare December 5, 2019 20:54

maumar commented Dec 5, 2019

View reviewed changes

src/EFCore.Relational/Query/RelationalParameterBasedQueryTranslationPostprocessor.cs Outdated Show resolved Hide resolved

maumar force-pushed the null_semantics_woes_pr branch 4 times, most recently from b26c794 to 40e4479 Compare December 7, 2019 18:33

roji mentioned this pull request Dec 8, 2019

General QueryContext-based mechanism for caching of generated RelationalCommand #17598

Open

roji reviewed Dec 8, 2019

View reviewed changes

maumar force-pushed the null_semantics_woes_pr branch from 5c11423 to 026959e Compare January 7, 2020 00:43