Fix equality check for simple floating types in RowContainer #7780

czentgr · 2023-11-29T01:23:12Z

The row container implementation for
equalsNoNulls and equalsWithNulls contained a bug:

Incorrect equals check for floating point types when NaN values are used.
Refactor to use SimpleVector::comparePrimitiveAsc in RowContainer and ContainerRowSerde for a common comparison function.
Change static SimpleVector::comparePrimitiveAsc to be static inline to reduce function call overhead in this expanded usage.

This is a continuation of PR #5833 which addressed floating point comparisons for complex types.

Affected operators:
FilterProject, TopN, TopNRowNumber, OrderBy, MergeExchange, LocalMerge, HashProbe, NestedLoopJoinProbe

The lists may not be complete.

netlify · 2023-11-29T01:23:17Z

✅ Deploy Preview for meta-velox canceled.

Name	Link
🔨 Latest commit	`59e7e4f`
🔍 Latest deploy log	https://app.netlify.com/sites/meta-velox/deploys/65f04a035438950008e5c857

aditi-pandit · 2024-01-10T06:55:50Z

velox/exec/tests/RowContainerTest.cpp

+    auto rowContainer = makeRowContainer({type}, {type}, false);
+    int numRows = values->size();
+    DecodedVector decodedWithNulls(*values);
+    auto rows = storeRows(decodedWithNulls, numRows, *rowContainer);


It looks like the test loops are very similar. Can you abstract a common function for it.

czentgr · 2024-01-29T18:36:25Z

@aditi-pandit @mbasmanova Please review. I addressed @aditi-pandit earlier comment. Thanks!

aditi-pandit · 2024-01-30T06:59:56Z

velox/exec/tests/RowContainerTest.cpp

@@ -466,8 +466,40 @@ class RowContainerTest : public exec::test::RowContainerTestBase {
    }
  }

+  template <typename T, bool mayHaveNulls>


Nit : Use of "mayHaveNulls" sounds ambiguous. Could you use just hasNulls ?

I used the same name as used for the equals function. The template doesn't indicate if the input vector has nulls or not - it actually always has nulls because I didn't want to generate two of them in the caller. Instead, it is passed to the equals function to expect nulls or not. Thus, if this is false, then null values must be removed from the input vector.

Perhaps it should be named equalsCanHandleNulls or something?

aditi-pandit · 2024-01-30T07:06:08Z

velox/exec/tests/RowContainerTest.cpp

@@ -466,8 +466,40 @@ class RowContainerTest : public exec::test::RowContainerTestBase {
    }
  }

+  template <typename T, bool mayHaveNulls>


Don't see the use of template type 'T' in the method. Is something missing ?

No, you are right. It is not needed. Which also means I can remove it from the other test as well. It is used in callers to distinguish the double/float type but it is not needed at this level anymore.

aditi-pandit · 2024-02-05T01:18:42Z

velox/exec/tests/RowContainerTest.cpp

+
+    int32_t index{0};
+    for (auto row : rows) {
+      ASSERT_TRUE(rowContainer->equals<canHandleNulls>(


We should add some tests for rowContainer->equals returning false as well.

What kind of values do you think of using to test the evaluation to false?
This test uses the edge values for floating points so compare them against each other? Or some random floating point values?

Yeah, I was thinking one edge against another like say NaN with max/min and also will some regular random floating point values.

Ok, I added some more tests to test random and edge values against NaN values.

kgpai · 2024-02-05T17:46:11Z

velox/exec/tests/RowContainerTest.cpp

+    }
+
+    auto numRows = values->size();
+    auto valuesSlice = values->slice(numNulls, numRows - numNulls);


Want to make sure I dont misunderstand this - but say you have 10 rows and 1 null, then you take the offset from 2nd to last. Are the nulls going to be from 0 to numNulls ?

Yes, the test uses a specific input vector of edge values where the initial row is a NULL. It then uses a subset to specifically test the rowContainer->equals<false> where no NULL values can be part of the input.

Practically, that means numNulls == 1.

An example for an input vector is:

velox/velox/exec/tests/RowContainerTest.cpp

Line 505 in 7f9914b

auto values = makeNullableFlatVector<T>(

There are tests for the various row types and they always have a specific input to test the edge cases.

aditi-pandit

Tests look good. Thanks @czentgr

aditi-pandit · 2024-02-16T03:09:37Z

velox/exec/tests/RowContainerTest.cpp

-  template <typename T, typename V>
-  void testOrderAndNullsFirstVariations(
+  template <bool canHandleNulls>
+  void testRowContainerEqualAPI(


Nit : Spelling "Equals"

aditi-pandit · 2024-02-16T04:37:09Z

@mbasmanova : PTAL.

czentgr

@bikramSingh91 Thank you for your detailed review and comments!
I addressed them. Please take a look.

czentgr · 2024-02-23T17:02:57Z

velox/exec/tests/RowContainerTest.cpp

+    for (auto row : rows) {
+      ASSERT_EQ(
+          expected->asFlatVector<bool>()->valueAt(index),
+          rowContainer->equals<canHandleNulls>(


That's a good point.

czentgr · 2024-02-23T17:03:53Z

velox/exec/tests/RowContainerTest.cpp

+    while (numRows) {
+      auto value = folly::Random::randDouble(min, max, gen);
+      rawData.push_back(value);
+      if (value == min || value == lowest || value == max) {


You are correct. We always compare to nan which any generated value or edge value should be false.

facebook-github-bot · 2024-02-23T23:46:36Z

@bikramSingh91 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

kagamiori · 2024-02-27T02:11:51Z

Hi @czentgr, this change looks good to me. But I wonder why the PR summary says that this change affects functions such as array_distinct and array_max? I saw the change affects RowContainer, but array_distinct and array_max do not use RowContainer, right?

czentgr · 2024-02-29T00:48:33Z

@kagamiori You are correct. The functions don't operate on the row container and the statement in the commit/PR is incorrect. I took a closer look at the functions and turns out there are more problems with NaN that aren't addressed by this fix.

For example, results of array_sort with Prestissimo (including the fix in this PR)

presto> select array_sort(a) from (values ARRAY[nan(), 0.0e0, 1.0e0]) t(a);
      _col0
-----------------
 [0.0, NaN, 1.0]
(1 row)

Java

presto>  select array_sort(a) from (values ARRAY[nan(), 0.0e0, 1.0e0]) t(a);
      _col0
-----------------
 [0.0, 1.0, NaN]
(1 row)

The same problem exists for array_sort_desc.

I also think there is a problem with set_union but both Java and Prestissimo return the same result. I expect NaN to be in the result once (as we consider NaN == NaN)

presto> SELECT set_union(a) FROM ( VALUES ARRAY[1.0, nan(), 3.0], ARRAY[nan(), 3.0, 4.0]) AS t(a);
           _col0
---------------------------
 [1.0, NaN, 3.0, NaN, 4.0]
(1 row)

kagamiori · 2024-02-29T21:28:27Z

@kagamiori You are correct. The functions don't operate on the row container and the statement in the commit/PR is incorrect. I took a closer look at the functions and turns out there are more problems with NaN that aren't addressed by this fix.

For example, results of array_sort with Prestissimo (including the fix in this PR)
presto> select array_sort(a) from (values ARRAY[nan(), 0.0e0, 1.0e0]) t(a);
      _col0
-----------------
 [0.0, NaN, 1.0]
(1 row)
Java
presto>  select array_sort(a) from (values ARRAY[nan(), 0.0e0, 1.0e0]) t(a);
      _col0
-----------------
 [0.0, 1.0, NaN]
(1 row)
The same problem exists for array_sort_desc.

I also think there is a problem with set_union but both Java and Prestissimo return the same result. I expect NaN to be in the result once (as we consider NaN == NaN)
presto> SELECT set_union(a) FROM ( VALUES ARRAY[1.0, nan(), 3.0], ARRAY[nan(), 3.0, 4.0]) AS t(a);
           _col0
---------------------------
 [1.0, NaN, 3.0, NaN, 4.0]
(1 row)

Hi @czentgr, thank you for updating the PR summary. We have recently noticed the NaN comparison issue in Velox functions too. (See #8738 and #8690) In fact, tracing back the behavior of NaN handling, we found that Presto has inconsistent NaN behaviors within and among functions too. There are two github issues in the Presto repo related to this problem: prestodb/presto#21877 and prestodb/presto#21936. Our current plan is to let Presto clarify (or fix) and document the NaN behavior of functions and then follow Presto in Velox. Please take a look at those github issues if you're interested.

facebook-github-bot · 2024-03-01T19:03:05Z

@bikramSingh91 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

czentgr · 2024-03-04T19:17:26Z

@bikramSingh91 Would you please be able to tell me on the Linter warnings? I would like to fix them and also rebase the PR again. Did I add new warnings in this PR?

czentgr · 2024-03-05T15:40:58Z

@kagamiori Thank you for the found issues. I have a documentation PR for Velox to clarify the NaN behavior there. This is adopted from the Spark behavior and makes the most sense and provides consistency. Presumably, Presto should follow this as well: #7237.

kagamiori · 2024-03-05T22:03:43Z

@bikramSingh91 Would you please be able to tell me on the Linter warnings? I would like to fix them and also rebase the PR again. Did I add new warnings in this PR?

Hi @czentgr, @bikramSingh91 is out of office this week. The internal linter warnings come from the fact that the newly added methods in RowContainerTest.cpp defines a set of variables through TEST_FLOATING_TYPE_LIMIT_VARIABLES, but not all the defined variables are used in these methods (i.e., warning of unused variable).

kagamiori · 2024-03-05T22:07:23Z

velox/exec/tests/RowContainerTest.cpp

+    while (numRows) {
+      auto value = folly::Random::randDouble(min, max, gen);
+      // Intersperse nan values.
+      if (static_cast<int64_t>(std::fmod(value, 3.0)) == 0) {
+        rawData.push_back(nan);
+        rawExpected.push_back(true);
+      } else {
+        rawData.push_back(value);
+        rawExpected.push_back(false);
+      }
+      --numRows;
+    }


Looks like this piece of code implicitly assumes that the other vector to be compared against the result of this method is an all-nan vector. But this assumption is not told in the name of this method or any comment. Could we add some comments to this method explaining what it does?

kagamiori · 2024-03-05T22:07:57Z

velox/exec/tests/RowContainerTest.cpp

+      const VectorPtr& lhs,
+      const VectorPtr& rhs,
+      const std::vector<bool>& rawExpected) {
+    TEST_FLOATING_TYPE_LIMIT_VARIABLES;


Is this macro not needed in this method?

kgpai · 2024-03-05T22:28:36Z

velox/exec/tests/RowContainerTest.cpp

+      int32_t numRows,
+      std::vector<std::optional<T>>& rawData,
+      std::vector<bool>& rawExpected) {
+    TEST_FLOATING_TYPE_LIMIT_VARIABLES;


This macro results in a lot of warnings of unusued variables. Can you replace it with something like : auto [max, min, _ , , , _] = FLOATING_TYPE_LIMIT_VARIABLES();
Where the function returns a const std::tuple and you only give names to the ones you need ?

I tried to use the tuple approach. The issue is that if you use more than one _ it complains because the identifier repeats. It doesn't exclude it. And from posts I read there should be warnings for _ as well as it is simply just a name for one of the returned entries.

Example:

/Users/czentgr/gitspace/velox/velox/exec/tests/RowContainerTest.cpp:630:25: error: redefinition of '_' const auto [nan, _, _, _, _] = getTestFloatingTypeLimitVariables<T>(); ^

I turned on-Wunused-variables to see what the compiler will do and on macOS (clang) it doesn't trigger the error if the entries were named but not used. However, the linter might still complain.

In most functions all the special values should be used except in a few where only NaN is used. So I will just not use the macro there. I experimented with a templated struct instead of the macro but this also causes more changes.

facebook-github-bot · 2024-03-08T03:55:21Z

@kagamiori has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

kagamiori

LGTM. Thank you for helping address the linter warning.

kagamiori · 2024-03-12T06:04:24Z

Hi @czentgr, could you rebase this PR onto the latest main? It's required for committing the code. Thanks!

The row container implementation for equalsNoNulls and equalsWithNulls contained a bug: 1. Incorrect equals check for floating point types when NaN values are used. 2. Refactor to use SimpleVector::comparePrimitiveAsc in RowContainer and ContainerRowSerde for a common comparison function. 3. Change static SimpleVector::comparePrimitiveAsc to be static inline to reduce function call overhead in this expanded usage. This is a continuation of PR facebookincubator#5833 which addressed floating point comparisons for complex types. Affected operators: FilterProject, TopN, TopNRowNumber, OrderBy, MergeExchange, LocalMerge, HashProbe, NestedLoopJoinProbe The lists may not be complete.

czentgr · 2024-03-12T12:27:40Z

@kagamiori done. Thanks!

facebook-github-bot · 2024-03-12T16:36:08Z

@kagamiori has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-03-12T21:08:13Z

@kagamiori merged this pull request in 451db90.

…kincubator#7780) Summary: The row container implementation for equalsNoNulls and equalsWithNulls contained a bug: 1. Incorrect equals check for floating point types when NaN values are used. 2. Refactor to use SimpleVector::comparePrimitiveAsc in RowContainer and ContainerRowSerde for a common comparison function. 3. Change static SimpleVector::comparePrimitiveAsc to be static inline to reduce function call overhead in this expanded usage. This is a continuation of PR facebookincubator#5833 which addressed floating point comparisons for complex types. Affected operators: FilterProject, TopN, TopNRowNumber, OrderBy, MergeExchange, LocalMerge, HashProbe, NestedLoopJoinProbe The lists may not be complete. Pull Request resolved: facebookincubator#7780 Reviewed By: Yuhta Differential Revision: D54141907 Pulled By: kagamiori fbshipit-source-id: 0306cfaffd4d486a0b72f6e6b659b40b2d66688f

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 29, 2023

czentgr marked this pull request as ready for review December 4, 2023 15:20

czentgr force-pushed the cz_fix_equal_nan branch 2 times, most recently from bfb6fe7 to 99a2445 Compare December 11, 2023 20:11

czentgr requested a review from aditi-pandit December 11, 2023 20:42

czentgr force-pushed the cz_fix_equal_nan branch from 99a2445 to 307dd1a Compare December 15, 2023 21:20

czentgr force-pushed the cz_fix_equal_nan branch from 307dd1a to 2ef2b29 Compare January 4, 2024 20:38

czentgr requested a review from mbasmanova January 4, 2024 20:39

czentgr force-pushed the cz_fix_equal_nan branch from 2ef2b29 to e8550cb Compare January 9, 2024 17:43

aditi-pandit reviewed Jan 10, 2024

View reviewed changes

aditi-pandit mentioned this pull request Jan 10, 2024

Refactor common code in RowContainer::equalWithNulls and equalNoNulls #8323

Closed

czentgr force-pushed the cz_fix_equal_nan branch from e8550cb to 8259968 Compare January 11, 2024 23:39

czentgr requested a review from aditi-pandit January 12, 2024 16:41

czentgr force-pushed the cz_fix_equal_nan branch from 8259968 to a4d1fba Compare January 12, 2024 18:33

czentgr force-pushed the cz_fix_equal_nan branch 2 times, most recently from 2db2153 to 4fe7aff Compare January 29, 2024 15:11

aditi-pandit reviewed Jan 30, 2024

View reviewed changes

czentgr force-pushed the cz_fix_equal_nan branch from 4fe7aff to d7e688f Compare January 31, 2024 23:12

mbasmanova requested review from kgpai and kagamiori February 1, 2024 11:51

czentgr force-pushed the cz_fix_equal_nan branch from d7e688f to 79843e5 Compare February 2, 2024 22:57

aditi-pandit reviewed Feb 5, 2024

View reviewed changes

kgpai reviewed Feb 5, 2024

View reviewed changes

czentgr force-pushed the cz_fix_equal_nan branch from 79843e5 to 1a1508d Compare February 9, 2024 19:57

aditi-pandit reviewed Feb 16, 2024

View reviewed changes

mbasmanova requested a review from bikramSingh91 February 16, 2024 13:07

czentgr commented Feb 23, 2024

View reviewed changes

bikramSingh91 approved these changes Feb 23, 2024

View reviewed changes

czentgr force-pushed the cz_fix_equal_nan branch from 4ad4331 to 77a66c7 Compare February 29, 2024 00:53

kagamiori reviewed Mar 5, 2024

View reviewed changes

kgpai reviewed Mar 5, 2024

View reviewed changes

czentgr force-pushed the cz_fix_equal_nan branch 2 times, most recently from 2441348 to cbe3d03 Compare March 6, 2024 18:10

kagamiori approved these changes Mar 8, 2024

View reviewed changes

Yuhta approved these changes Mar 11, 2024

View reviewed changes

czentgr force-pushed the cz_fix_equal_nan branch from cbe3d03 to 59e7e4f Compare March 12, 2024 12:26

facebook-github-bot closed this in 451db90 Mar 12, 2024

facebook-github-bot added the Merged label Mar 12, 2024

czentgr deleted the cz_fix_equal_nan branch March 19, 2024 13:10

ethanyzhang mentioned this pull request May 2, 2024

[native] Add floating point aggregate tests with NaN prestodb/presto#21447

Merged

6 tasks

Fix equality check for simple floating types in RowContainer #7780

Fix equality check for simple floating types in RowContainer #7780

Conversation

czentgr commented Nov 29, 2023 • edited Loading

netlify bot commented Nov 29, 2023 • edited Loading

✅ Deploy Preview for meta-velox canceled.

Choose a reason for hiding this comment

czentgr commented Jan 29, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aditi-pandit left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aditi-pandit commented Feb 16, 2024

czentgr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented Feb 23, 2024

kagamiori commented Feb 27, 2024

czentgr commented Feb 29, 2024 • edited Loading

kagamiori commented Feb 29, 2024 • edited Loading

facebook-github-bot commented Mar 1, 2024

czentgr commented Mar 4, 2024

czentgr commented Mar 5, 2024

kagamiori commented Mar 5, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented Mar 8, 2024

kagamiori left a comment

Choose a reason for hiding this comment

kagamiori commented Mar 12, 2024

czentgr commented Mar 12, 2024

facebook-github-bot commented Mar 12, 2024

facebook-github-bot commented Mar 12, 2024

czentgr commented Nov 29, 2023 •

edited

Loading

netlify bot commented Nov 29, 2023 •

edited

Loading

czentgr commented Feb 29, 2024 •

edited

Loading

kagamiori commented Feb 29, 2024 •

edited

Loading