Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix imcompatible between presto and velox when hash double/float -0.0 #9511

Closed

Conversation

shenh062326
Copy link

@shenh062326 shenh062326 commented Apr 17, 2024

For double/float -0.0 and 0.0, folly::hasher will return the same value.
but in presto/mysql, the return hash values are not equal. We should firstly
cast double/float to longBits, then call folly::hasher to get hash value.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 17, 2024
Copy link

netlify bot commented Apr 17, 2024

Deploy Preview for meta-velox ready!

Name Link
🔨 Latest commit 925fb3b
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/662081f49262eb00083e32c7
😎 Deploy Preview https://deploy-preview-9511--meta-velox.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@Yuhta
Copy link
Contributor

Yuhta commented Apr 17, 2024

@bikramSingh91 @rschlussel Similar to NaN, how does Presto treat 0.0 and -0.0? It seems 0.0 = -0.0 returns true, but during join/groupby they are treated differently? Should we add this to our plan of NaN unification?

@spershin
Copy link
Contributor

Agree that we need 1st to check what we want it to be in Presto.
Also, just curious, since bit representation of 0.0 and -0.0 look different, does folly::hasher() performa an extra check to return the same hash for 0.0 and -0.0?

@bikramSingh91
Copy link
Contributor

does folly::hasher() performa an extra check to return the same hash for 0.0 and -0.0?

@spershin thats probably true based on this comment in folly

https://github.com/facebook/folly/blob/427a1ef2c1641e0cfaec5c9300487d084175b00e/folly/Range.h#L1664-L1665

@rschlussel
Copy link

@bikramSingh91 @rschlussel Similar to NaN, how does Presto treat 0.0 and -0.0? It seems 0.0 = -0.0 returns true, but during join/groupby they are treated differently? Should we add this to our plan of NaN unification?

do you have an example query to reproduce this? Earlier I had the same suspicion about group by and tried to test it out, but didn't see the behavior you are describing.

@Yuhta
Copy link
Contributor

Yuhta commented Apr 17, 2024

@rschlussel

> select 0.0 = -0.0;
 _col0
-------
 true

> WITH t as (SELECT * FROM unnest(array[0.0, -0.0], array[1, 2]) as _t(k, v)) select * from t join t as t2 on t.k = t2.k;
  k   | v |  k   | v
------+---+------+---
  0.0 | 1 |  0.0 | 1
 -0.0 | 2 | -0.0 | 2

@rui-mo
Copy link
Collaborator

rui-mo commented Apr 18, 2024

With this change, do we need to normalize -0.0 and 0.0 in #9272 for Spark? @zhli1142015

@zhli1142015
Copy link
Contributor

zhli1142015 commented Apr 18, 2024

With this change, do we need to normalize -0.0 and 0.0 in #9272 for Spark? @zhli1142015

Yes, we need to also normalize -0.0 and 0.0. If this is correct behavior for Presto, then Presto and spark expect very different behaviors when handling Nans and -0.0 / 0.0. Should we continue the #9272, @mbasmanova? thanks.

@shenh062326
Copy link
Author

shenh062326 commented Apr 18, 2024

This is the behavior of mysql, like presto, it also won't aggregate -0 and 0.

mysql> create table double_basic_test_2(id int,col1 double,col2 double) ;
Query OK, 0 rows affected (0.04 sec)

mysql> insert into double_basic_test_2 values(6,'0','-0');
Query OK, 1 row affected (0.04 sec)

mysql> insert into double_basic_test_2 values(5,0,0);
Query OK, 1 row affected (0.04 sec)

mysql> select sum(id),col2 from double_basic_test_2 group by col2;
+---------+------+
| sum(id) | col2 |
+---------+------+
|       6 |   -0 |
|       5 |    0 |
+---------+------+
2 rows in set (0.03 sec)

@aditi-pandit
Copy link
Collaborator

@czentgr, @karteekmurthys

Copy link

stale bot commented Jul 18, 2024

This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the PR, make sure you've addressed reviewer comments, and rebase on the latest main. Thank you for your contributions!

@stale stale bot added the stale label Jul 18, 2024
@stale stale bot closed this Aug 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants