-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix comparisons for complex types using floats in RowContainer #5833
Conversation
✅ Deploy Preview for meta-velox canceled.
|
66e859a
to
8dc3e63
Compare
Test results prior to code change:
Test result after fix
|
8dc3e63
to
7b58e06
Compare
@aditi-pandit @mbasmanova if you get a chance please review. This is the fix for the NaN value in array group by issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@czentgr Thank you for fixing this.
Should we also check BaseVector::compare and equalValueAt to see if it has a similar issue?
velox/exec/tests/AggregationTest.cpp
Outdated
@@ -123,6 +123,22 @@ class AggregationTest : public OperatorTestBase { | |||
return vectors; | |||
} | |||
|
|||
template <typename T> | |||
void testFloatingPointArrayNaNGroupBy() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is better tested in a unit test inside velox/exec/tests/ContainerRowSerdeTest.cpp
Also, please, document this behavior in a method comment in .h file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think https://github.com/facebookincubator/velox/blob/main/velox/exec/tests/RowContainerTest.cpp#L281 was intended to test float comparison. But it didn't go as far as testing when these cases are in Arrays, Complex types. Might be good to enhance the tests here in RowContainer to check for the special floating point values in the Arrays, Maps and Complex types.
velox/exec/ContainerRowSerde.cpp
Outdated
} | ||
|
||
template <> | ||
int compare<TypeKind::DOUBLE>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is another version of compare that takes 2 ByteStreams. Should it be modified as well?
@mbasmanova Thank you for your review. I'm on vacation for the next two weeks and be back Aug 17. Then I will take care of your comments. |
@czentgr : Yes, I too feel that this code change would be needed but the problem has a bigger scope requiring a change at BaseVector::compare and equalValueAt level. So it would be worth spending some time looking at different SQL queries which need a NaN comparison in different places to come up with the most generic solution. |
Thank you for the heads up. |
I checked The difference to the vector functions and I believe the Vector cases are already handled properly. Perhaps there is something where two streams are compared that could have a similar situation? It doesn't appear that |
I see this method in velox/exec/ContainerRowSerde.h
|
Yes. I found this since as well. Which means this piece is also susceptible to the bug. There is lots of the same code with minor differences in how values are accessed from the input (byte stream or vector). |
@czentgr : In general RowContainer::compare (which is used in HashAgg, HashJoin, Window etc) would be the origin of these calls. |
We should standardize on this style I feel https://github.com/facebookincubator/velox/blob/main/velox/exec/RowContainer.h#L1023 |
Yes, this should be a better fix compared to making new templated functions. There is some addition with regards to the (possibly requested) ordering but lets see if this can be worked out. |
7b58e06
to
86bc197
Compare
velox/exec/ContainerRowSerde.cpp
Outdated
@@ -587,7 +600,20 @@ int32_t compare( | |||
using T = typename TypeTraits<Kind>::NativeType; | |||
T leftValue = left.read<T>(); | |||
T rightValue = right.read<T>(); | |||
auto result = leftValue == rightValue ? 0 : leftValue < rightValue ? -1 : 1; | |||
int32_t result{0}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code looks exactly the same as for the other function. Can we abstract this in a common function ?
@@ -144,5 +144,36 @@ TEST_F(ContainerRowSerdeTest, nested) { | |||
testRoundTrip(nestedArray); | |||
} | |||
|
|||
TEST_F(ContainerRowSerdeTest, compareDoubleEquals) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general it seems like all the comparison function tests are done through RowContainerTests. That is the main caller and touch-point for Velox operators. Please can you add tests there also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will take a look at those tests.
auto leftPosition = serialize(data); | ||
HashStringAllocator::prepareRead(leftPosition.header, left); | ||
|
||
SelectivityVector allRows(data->size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this DecodedVector ? Can we not compare with a FlatVector ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The API requires a DecodedVector (there is a difference in the interface and implementation). When using FlatVector (which I originally tried) compilation fails with
/Users/czentgr/gitspace/velox/velox/exec/tests/ContainerRowSerdeTest.cpp:160:18: error: no matching function for call to 'compare'
EXPECT_EQ(0, ContainerRowSerde::compare(left, data, i, flags));
...
/Users/czentgr/gitspace/velox/./velox/exec/ContainerRowSerde.h:35:18: note: candidate function not viable: no known conversion from 'FlatVectorPtr<EvalType<double>>' (aka 'shared_ptr<FlatVector<double>>') to 'const facebook::velox::DecodedVector' for 2nd argument
static int32_t compare(
HashStringAllocator::prepareRead(rightPosition.header, right); | ||
auto type = data->type(); | ||
|
||
for (auto i = 0; i < data->size(); i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try different combinations where Nan is compared with non-Nan values also as those code paths have changed now.
Also add tests for ascending/descending variation of flags.
1d7f0d1
to
0a985eb
Compare
@mbasmanova @aditi-pandit Please take another look when you get a chance. I've added comprehensive tests for the complex types based on the existing |
f5ba3d2
to
4c94c53
Compare
@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@mbasmanova merged this pull request in 714a7ab. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
The row container implementation for equalsNoNulls and equalsWithNulls contained a bug: 1. Incorrect equals check for floating point types when NaN values are used. 2. Refactor to use SimpleVector::comparePrimitiveAsc in RowContainer and ContainerRowSerde for a common comparison function. 3. Change static SimpleVector::comparePrimitiveAsc to be static inline to reduce function call overhead in this expanded usage. This is a continuation of PR facebookincubator#5833 which addressed floating point comparisons for complex types. Affected operators: FilterProject, TopN, TopNRowNumber, OrderBy, MergeExchange, LocalMerge, HashProbe, NestedLoopJoinProbe Affected functions: array_distinct, array_duplicates, array_except, array_max, array_min, array_has_duplicates, array_sort, array_sort_desc, map_union, min, max, min_by, max_by, set_union The lists may not be complete.
The row container implementation for equalsNoNulls and equalsWithNulls contained a bug: 1. Incorrect equals check for floating point types when NaN values are used. 2. Refactor to use SimpleVector::comparePrimitiveAsc in RowContainer and ContainerRowSerde for a common comparison function. 3. Change static SimpleVector::comparePrimitiveAsc to be static inline to reduce function call overhead in this expanded usage. This is a continuation of PR facebookincubator#5833 which addressed floating point comparisons for complex types. Affected operators: FilterProject, TopN, TopNRowNumber, OrderBy, MergeExchange, LocalMerge, HashProbe, NestedLoopJoinProbe Affected functions: array_distinct, array_duplicates, array_except, array_max, array_min, array_has_duplicates, array_sort, array_sort_desc, map_union, min, max, min_by, max_by, set_union The lists may not be complete.
The row container implementation for equalsNoNulls and equalsWithNulls contained a bug: 1. Incorrect equals check for floating point types when NaN values are used. 2. Refactor to use SimpleVector::comparePrimitiveAsc in RowContainer and ContainerRowSerde for a common comparison function. 3. Change static SimpleVector::comparePrimitiveAsc to be static inline to reduce function call overhead in this expanded usage. This is a continuation of PR facebookincubator#5833 which addressed floating point comparisons for complex types. Affected operators: FilterProject, TopN, TopNRowNumber, OrderBy, MergeExchange, LocalMerge, HashProbe, NestedLoopJoinProbe Affected functions: array_distinct, array_duplicates, array_except, array_max, array_min, array_has_duplicates, array_sort, array_sort_desc, map_union, min, max, min_by, max_by, set_union The lists may not be complete.
The row container implementation for equalsNoNulls and equalsWithNulls contained a bug: 1. Incorrect equals check for floating point types when NaN values are used. 2. Refactor to use SimpleVector::comparePrimitiveAsc in RowContainer and ContainerRowSerde for a common comparison function. 3. Change static SimpleVector::comparePrimitiveAsc to be static inline to reduce function call overhead in this expanded usage. This is a continuation of PR facebookincubator#5833 which addressed floating point comparisons for complex types. Affected operators: FilterProject, TopN, TopNRowNumber, OrderBy, MergeExchange, LocalMerge, HashProbe, NestedLoopJoinProbe Affected functions: array_distinct, array_duplicates, array_except, array_max, array_min, array_has_duplicates, array_sort, array_sort_desc, map_union, min, max, min_by, max_by, set_union The lists may not be complete.
The row container implementation for equalsNoNulls and equalsWithNulls contained a bug: 1. Incorrect equals check for floating point types when NaN values are used. 2. Refactor to use SimpleVector::comparePrimitiveAsc in RowContainer and ContainerRowSerde for a common comparison function. 3. Change static SimpleVector::comparePrimitiveAsc to be static inline to reduce function call overhead in this expanded usage. This is a continuation of PR facebookincubator#5833 which addressed floating point comparisons for complex types. Affected operators: FilterProject, TopN, TopNRowNumber, OrderBy, MergeExchange, LocalMerge, HashProbe, NestedLoopJoinProbe Affected functions: array_distinct, array_duplicates, array_except, array_max, array_min, array_has_duplicates, array_sort, array_sort_desc, map_union, min, max, min_by, max_by, set_union The lists may not be complete.
The row container implementation for equalsNoNulls and equalsWithNulls contained a bug: 1. Incorrect equals check for floating point types when NaN values are used. 2. Refactor to use SimpleVector::comparePrimitiveAsc in RowContainer and ContainerRowSerde for a common comparison function. 3. Change static SimpleVector::comparePrimitiveAsc to be static inline to reduce function call overhead in this expanded usage. This is a continuation of PR facebookincubator#5833 which addressed floating point comparisons for complex types. Affected operators: FilterProject, TopN, TopNRowNumber, OrderBy, MergeExchange, LocalMerge, HashProbe, NestedLoopJoinProbe Affected functions: array_distinct, array_duplicates, array_except, array_max, array_min, array_has_duplicates, array_sort, array_sort_desc, map_union, min, max, min_by, max_by, set_union The lists may not be complete.
The row container implementation for equalsNoNulls and equalsWithNulls contained a bug: 1. Incorrect equals check for floating point types when NaN values are used. 2. Refactor to use SimpleVector::comparePrimitiveAsc in RowContainer and ContainerRowSerde for a common comparison function. 3. Change static SimpleVector::comparePrimitiveAsc to be static inline to reduce function call overhead in this expanded usage. This is a continuation of PR facebookincubator#5833 which addressed floating point comparisons for complex types. Affected operators: FilterProject, TopN, TopNRowNumber, OrderBy, MergeExchange, LocalMerge, HashProbe, NestedLoopJoinProbe Affected functions: array_distinct, array_duplicates, array_except, array_max, array_min, array_has_duplicates, array_sort, array_sort_desc, map_union, min, max, min_by, max_by, set_union The lists may not be complete.
The row container implementation for equalsNoNulls and equalsWithNulls contained a bug: 1. Incorrect equals check for floating point types when NaN values are used. 2. Refactor to use SimpleVector::comparePrimitiveAsc in RowContainer and ContainerRowSerde for a common comparison function. 3. Change static SimpleVector::comparePrimitiveAsc to be static inline to reduce function call overhead in this expanded usage. This is a continuation of PR facebookincubator#5833 which addressed floating point comparisons for complex types. Affected operators: FilterProject, TopN, TopNRowNumber, OrderBy, MergeExchange, LocalMerge, HashProbe, NestedLoopJoinProbe Affected functions: array_distinct, array_duplicates, array_except, array_max, array_min, array_has_duplicates, array_sort, array_sort_desc, map_union, min, max, min_by, max_by, set_union The lists may not be complete.
The row container implementation for equalsNoNulls and equalsWithNulls contained a bug: 1. Incorrect equals check for floating point types when NaN values are used. 2. Refactor to use SimpleVector::comparePrimitiveAsc in RowContainer and ContainerRowSerde for a common comparison function. 3. Change static SimpleVector::comparePrimitiveAsc to be static inline to reduce function call overhead in this expanded usage. This is a continuation of PR facebookincubator#5833 which addressed floating point comparisons for complex types. Affected operators: FilterProject, TopN, TopNRowNumber, OrderBy, MergeExchange, LocalMerge, HashProbe, NestedLoopJoinProbe Affected functions: array_distinct, array_duplicates, array_except, array_max, array_min, array_has_duplicates, array_sort, array_sort_desc, map_union, min, max, min_by, max_by, set_union The lists may not be complete.
The row container implementation for equalsNoNulls and equalsWithNulls contained a bug: 1. Incorrect equals check for floating point types when NaN values are used. 2. Refactor to use SimpleVector::comparePrimitiveAsc in RowContainer and ContainerRowSerde for a common comparison function. 3. Change static SimpleVector::comparePrimitiveAsc to be static inline to reduce function call overhead in this expanded usage. This is a continuation of PR facebookincubator#5833 which addressed floating point comparisons for complex types. Affected operators: FilterProject, TopN, TopNRowNumber, OrderBy, MergeExchange, LocalMerge, HashProbe, NestedLoopJoinProbe Affected functions: array_distinct, array_duplicates, array_except, array_max, array_min, array_has_duplicates, array_sort, array_sort_desc, map_union, min, max, min_by, max_by, set_union The lists may not be complete.
The row container implementation for equalsNoNulls and equalsWithNulls contained a bug: 1. Incorrect equals check for floating point types when NaN values are used. 2. Refactor to use SimpleVector::comparePrimitiveAsc in RowContainer and ContainerRowSerde for a common comparison function. 3. Change static SimpleVector::comparePrimitiveAsc to be static inline to reduce function call overhead in this expanded usage. This is a continuation of PR facebookincubator#5833 which addressed floating point comparisons for complex types. Affected operators: FilterProject, TopN, TopNRowNumber, OrderBy, MergeExchange, LocalMerge, HashProbe, NestedLoopJoinProbe Affected functions: array_distinct, array_duplicates, array_except, array_max, array_min, array_has_duplicates, array_sort, array_sort_desc, map_union, min, max, min_by, max_by, set_union The lists may not be complete.
The row container implementation for equalsNoNulls and equalsWithNulls contained a bug: 1. Incorrect equals check for floating point types when NaN values are used. 2. Refactor to use SimpleVector::comparePrimitiveAsc in RowContainer and ContainerRowSerde for a common comparison function. 3. Change static SimpleVector::comparePrimitiveAsc to be static inline to reduce function call overhead in this expanded usage. This is a continuation of PR facebookincubator#5833 which addressed floating point comparisons for complex types. Affected operators: FilterProject, TopN, TopNRowNumber, OrderBy, MergeExchange, LocalMerge, HashProbe, NestedLoopJoinProbe Affected functions: array_distinct, array_duplicates, array_except, array_max, array_min, array_has_duplicates, array_sort, array_sort_desc, map_union, min, max, min_by, max_by, set_union The lists may not be complete.
The row container implementation for equalsNoNulls and equalsWithNulls contained a bug: 1. Incorrect equals check for floating point types when NaN values are used. 2. Refactor to use SimpleVector::comparePrimitiveAsc in RowContainer and ContainerRowSerde for a common comparison function. 3. Change static SimpleVector::comparePrimitiveAsc to be static inline to reduce function call overhead in this expanded usage. This is a continuation of PR facebookincubator#5833 which addressed floating point comparisons for complex types. Affected operators: FilterProject, TopN, TopNRowNumber, OrderBy, MergeExchange, LocalMerge, HashProbe, NestedLoopJoinProbe Affected functions: array_distinct, array_duplicates, array_except, array_max, array_min, array_has_duplicates, array_sort, array_sort_desc, map_union, min, max, min_by, max_by, set_union The lists may not be complete.
The row container implementation for equalsNoNulls and equalsWithNulls contained a bug: 1. Incorrect equals check for floating point types when NaN values are used. 2. Refactor to use SimpleVector::comparePrimitiveAsc in RowContainer and ContainerRowSerde for a common comparison function. 3. Change static SimpleVector::comparePrimitiveAsc to be static inline to reduce function call overhead in this expanded usage. This is a continuation of PR facebookincubator#5833 which addressed floating point comparisons for complex types. Affected operators: FilterProject, TopN, TopNRowNumber, OrderBy, MergeExchange, LocalMerge, HashProbe, NestedLoopJoinProbe Affected functions: array_distinct, array_duplicates, array_except, array_max, array_min, array_has_duplicates, array_sort, array_sort_desc, map_union, min, max, min_by, max_by, set_union The lists may not be complete.
The row container implementation for equalsNoNulls and equalsWithNulls contained a bug: 1. Incorrect equals check for floating point types when NaN values are used. 2. Refactor to use SimpleVector::comparePrimitiveAsc in RowContainer and ContainerRowSerde for a common comparison function. 3. Change static SimpleVector::comparePrimitiveAsc to be static inline to reduce function call overhead in this expanded usage. This is a continuation of PR facebookincubator#5833 which addressed floating point comparisons for complex types. Affected operators: FilterProject, TopN, TopNRowNumber, OrderBy, MergeExchange, LocalMerge, HashProbe, NestedLoopJoinProbe Affected functions: array_distinct, array_duplicates, array_except, array_max, array_min, array_has_duplicates, array_sort, array_sort_desc, map_union, min, max, min_by, max_by, set_union The lists may not be complete.
The row container implementation for equalsNoNulls and equalsWithNulls contained a bug: 1. Incorrect equals check for floating point types when NaN values are used. 2. Refactor to use SimpleVector::comparePrimitiveAsc in RowContainer and ContainerRowSerde for a common comparison function. 3. Change static SimpleVector::comparePrimitiveAsc to be static inline to reduce function call overhead in this expanded usage. This is a continuation of PR facebookincubator#5833 which addressed floating point comparisons for complex types. Affected operators: FilterProject, TopN, TopNRowNumber, OrderBy, MergeExchange, LocalMerge, HashProbe, NestedLoopJoinProbe The lists may not be complete.
The row container implementation for equalsNoNulls and equalsWithNulls contained a bug: 1. Incorrect equals check for floating point types when NaN values are used. 2. Refactor to use SimpleVector::comparePrimitiveAsc in RowContainer and ContainerRowSerde for a common comparison function. 3. Change static SimpleVector::comparePrimitiveAsc to be static inline to reduce function call overhead in this expanded usage. This is a continuation of PR facebookincubator#5833 which addressed floating point comparisons for complex types. Affected operators: FilterProject, TopN, TopNRowNumber, OrderBy, MergeExchange, LocalMerge, HashProbe, NestedLoopJoinProbe The lists may not be complete.
The row container implementation for equalsNoNulls and equalsWithNulls contained a bug: 1. Incorrect equals check for floating point types when NaN values are used. 2. Refactor to use SimpleVector::comparePrimitiveAsc in RowContainer and ContainerRowSerde for a common comparison function. 3. Change static SimpleVector::comparePrimitiveAsc to be static inline to reduce function call overhead in this expanded usage. This is a continuation of PR facebookincubator#5833 which addressed floating point comparisons for complex types. Affected operators: FilterProject, TopN, TopNRowNumber, OrderBy, MergeExchange, LocalMerge, HashProbe, NestedLoopJoinProbe The lists may not be complete.
The row container implementation for equalsNoNulls and equalsWithNulls contained a bug: 1. Incorrect equals check for floating point types when NaN values are used. 2. Refactor to use SimpleVector::comparePrimitiveAsc in RowContainer and ContainerRowSerde for a common comparison function. 3. Change static SimpleVector::comparePrimitiveAsc to be static inline to reduce function call overhead in this expanded usage. This is a continuation of PR facebookincubator#5833 which addressed floating point comparisons for complex types. Affected operators: FilterProject, TopN, TopNRowNumber, OrderBy, MergeExchange, LocalMerge, HashProbe, NestedLoopJoinProbe The lists may not be complete.
Summary: The row container implementation for equalsNoNulls and equalsWithNulls contained a bug: 1. Incorrect equals check for floating point types when NaN values are used. 2. Refactor to use SimpleVector::comparePrimitiveAsc in RowContainer and ContainerRowSerde for a common comparison function. 3. Change static SimpleVector::comparePrimitiveAsc to be static inline to reduce function call overhead in this expanded usage. This is a continuation of PR #5833 which addressed floating point comparisons for complex types. Affected operators: FilterProject, TopN, TopNRowNumber, OrderBy, MergeExchange, LocalMerge, HashProbe, NestedLoopJoinProbe The lists may not be complete. Pull Request resolved: #7780 Reviewed By: Yuhta Differential Revision: D54141907 Pulled By: kagamiori fbshipit-source-id: 0306cfaffd4d486a0b72f6e6b659b40b2d66688f
…kincubator#7780) Summary: The row container implementation for equalsNoNulls and equalsWithNulls contained a bug: 1. Incorrect equals check for floating point types when NaN values are used. 2. Refactor to use SimpleVector::comparePrimitiveAsc in RowContainer and ContainerRowSerde for a common comparison function. 3. Change static SimpleVector::comparePrimitiveAsc to be static inline to reduce function call overhead in this expanded usage. This is a continuation of PR facebookincubator#5833 which addressed floating point comparisons for complex types. Affected operators: FilterProject, TopN, TopNRowNumber, OrderBy, MergeExchange, LocalMerge, HashProbe, NestedLoopJoinProbe The lists may not be complete. Pull Request resolved: facebookincubator#7780 Reviewed By: Yuhta Differential Revision: D54141907 Pulled By: kagamiori fbshipit-source-id: 0306cfaffd4d486a0b72f6e6b659b40b2d66688f
The issue is that
leftValue == rightValue
does not work if both sides are NaN. In fact, none of the comparisons with a NaN work. Equals is false and other comparisons cause an exception.Therefore, for floating points additional logic is needed to handle the equals case.
The row container implementation contained two bugs when
using complex types.
types when NaN values are used.
lower level functions.
and ContainerRowSerde for a common comparison function.
In addition, the PR adds testing for floating point using the
RowContainer (and subsequent ContainerRowSerde).
When using floating point values in complex types (ROW, ARRAY, MAP (key))
the following operators are affected by the changes:
Operators:
FilterProject, TopN, TopNRowNumber, OrderBy, MergeExchange,
LocalMerge, HashProbe, NestedLoopJoinProbe
Functions:
array_distinct, array_duplicates, array_except, array_max,
array_min, array_has_duplicates, array_sort, array_sort_desc,
map_union, min, max, min_by, max_by, set_union
The lists may not be complete. Any operator/function that requires a
comparison of elements of complex types that use
floating point type is affected.
Fixes prestodb/presto#20283