Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GraphBolt][CUDA] GPUCachedFeature update bug when feature dimensions differ #7377

Closed
mfbalin opened this issue May 7, 2024 · 2 comments · Fixed by #7384
Closed

[GraphBolt][CUDA] GPUCachedFeature update bug when feature dimensions differ #7377

mfbalin opened this issue May 7, 2024 · 2 comments · Fixed by #7384
Assignees
Labels
bug:confirmed Something isn't working Release Blocker Issues that blocks release Work Item Work items tracked in project tracker

Comments

@mfbalin
Copy link
Collaborator

mfbalin commented May 7, 2024

🔨Work Item

IMPORTANT:

  • This template is only for dev team to track project progress. For feature request or bug report, please use the corresponding issue templates.
  • DO NOT create a new work item if the purpose is to fix an existing issue or feature request. We will directly use the issue in the project tracker.

Project tracker: https://github.com/orgs/dmlc/projects/2

Description

features.update("node", None, "feat", y)

GPUCachedFeature does not currently support updating the feature with different dimensions when initialized. Our inferencing loop uses the update function. Since layers have differing hidden dimensions, we get the error below.

We need to update GPUCachedFeature so that it reconstructs the GPUCache again when the feature dimension is changed.

This bug prevents us from using GPUCachedFeature in our single GPU examples where we have the inferencing loop.

----------------------------------------------------------------------------------------------------------------------------------------------------------------------
node_classification_advanced.py 458 <module>
main()

node_classification_advanced.py 444 main
test_acc = layerwise_infer(

_contextlib.py 115 decorate_context
return func(*args, **kwargs)

node_classification_advanced.py 271 layerwise_infer
pred = model.inference(graph, features, dataloader, args.feature_device)

node_classification_advanced.py 149 inference
features.update("node", None, "feat", y)

basic_feature_store.py 134 update
self._features[(domain, type_name, feature_name)].update(value, ids)

gpu_cached_feature.py 109 update
self._feature.replace(

gpu_cache.py 48 replace
self._cache.replace(keys, values)

RuntimeError:
Values should have the correct dimensions.
@mfbalin mfbalin added the Work Item Work items tracked in project tracker label May 7, 2024
@mfbalin mfbalin self-assigned this May 7, 2024
@mfbalin mfbalin added the bug:confirmed Something isn't working label May 7, 2024
@mfbalin
Copy link
Collaborator Author

mfbalin commented May 7, 2024

@Rhett-Ying @frozenbugs

@mfbalin
Copy link
Collaborator Author

mfbalin commented May 8, 2024

This could be a release blocker. Now that we are aware of the issue, it would be really nice if next immediate release fixed it. I moved my place today so I will work on the fix tomorrow.

@mfbalin mfbalin added the Release Blocker Issues that blocks release label May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug:confirmed Something isn't working Release Blocker Issues that blocks release Work Item Work items tracked in project tracker
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant