-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GraphBolt] Make unique_and_compact deterministic #7217
Conversation
To trigger regression tests:
|
LGTM, as per our discussion with @peizhou001. |
How much slower is this new implementation compared to the old one? |
@mfbalin The time complexity should be the same. I also tested on my local machine, there's no significant slow down in any of |
@RamonZhou Could you also test the advanced pyg example with |
@mfbalin I tested it and the average epoch time is 7.94s before and 7.99s after |
Thank you. Let's test it again before we merge it. The changes during the review process may impact performance. |
we may also add an option to decide which path to go depends on whether user need deterministic or not. |
Maybe not? because it keeps the same speed while making the result sable, there is no reason why user want the result non-deterministic w/o performance benefits. |
@RamonZhou how is the performance now on the advanced PyG example? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. If the performance is same as before or even better (we insert all in a single parallel for now, should be a little faster), we can go ahead and merge it.
I see around 5% performance loss with this PR on my machine with the advanced PyG example |
@RamonZhou it might be best to measure the performance by making the feature dimension smaller in case the feature fetch is the bottleneck on your system. Like |
@mfbalin I tested again with the latest master branch merged. It's 7.911s (before) vs. 7.919s (after) |
I did another test and the results are (features are sliced with
So it seems that deleting the |
The fact that all the numbers are so close is suspicious to me. Could we test with no feature fetch and no forward pass? |
Without feature fetching and forward pass: 4.319s (before) and 4.707s (after). About 9% slower |
Do you want to look into implementing atomic load and see if it helps before the cas loop? Or do you want to merge this first, then continue to optimize? |
@mfbalin I tried to implement it but I think the atomic load can just be a |
https://gcc.gnu.org/onlinedocs/gcc-4.1.0/gcc/Atomic-Builtins.html |
I see. I tried |
Let's merge this first and optimize in the future. |
Description
Doc: https://docs.google.com/document/d/1fWX6GhQySniFCSmXnKw8mxZQ26WngD5jEi8vt2H41Ss
Checklist
Please feel free to remove inapplicable items for your PR.
Changes