Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dgl.lap_pe cannot scale to large graphs due to materialization of dense adjacency matrix #5854

Closed
BarclayII opened this issue Jun 12, 2023 · 2 comments · Fixed by #5855
Closed
Assignees

Comments

@BarclayII
Copy link
Collaborator

🐛 Bug

dgl.lap_pe cannot scale to large graphs due to materialization of dense adjacency matrix

To Reproduce

@vijaydwivedi75 told me that he would like to scale dgl.lap_pe to larger graphs like OGB products, but current implementation throws a memory error:

import dgl 
from ogb.nodeproppred import DglNodePropPredDataset
dataset = DglNodePropPredDataset(name='ogbn-products')
graph = dataset[0][0]

dgl.lap_pe(graph, 5)
Traceback (most recent call last): 
  File "<stdin>", line 1, in <module>
  File "/home/vdwivedi/miniconda3/envs/transformers/lib/python3.10/site-packages/dgl/transforms/functional.py", line 3674, in lap_pe
    EigVal, EigVec = np.linalg.eig(L.toarray())
  File "/home/vdwivedi/miniconda3/envs/transformers/lib/python3.10/site-packages/scipy/sparse/_compressed.py", line 1051, in toarray
    out = self._process_toarray_args(order, out)
  File "/home/vdwivedi/miniconda3/envs/transformers/lib/python3.10/site-packages/scipy/sparse/_base.py", line 1298, in _process_toarray_args
    return np.zeros(self.shape, dtype=self.dtype, order=order)
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 43.6 TiB for an array with shape (2449029, 2449029) and data type float64

Expected behavior

We could make it work on graphs like OGB products (see fix below)

Environment

  • DGL Version (e.g., 1.0): master
  • Backend Library & Version (e.g., PyTorch 0.4.1, MXNet/Gluon 1.3):
  • OS (e.g., Linux):
  • How you installed DGL (conda, pip, source):
  • Build command you used (if compiling from source):
  • Python version:
  • CUDA/cuDNN version (if applicable):
  • GPU models and configuration (e.g. V100):
  • Any other relevant information:

Additional context

@vijaydwivedi75 also suggested a fix. Instead of materializing the dense adjacency matrix in

EigVal, EigVec = np.linalg.eig(L.toarray())

We could as well use scipy's implementation which can work on sparse matrices:

EigVal, EigVec = sp.linalg.eigs(L, k=pos_enc_dim+1, which='SR', tol=1e-2) # works fine for ogbn-products
@github-actions
Copy link

This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you

@rudongyu rudongyu self-assigned this Jul 20, 2023
@github-actions
Copy link

This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants