Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GraphBolt] Temporal link prediction example crash #7505

Open
mfbalin opened this issue Jul 4, 2024 · 7 comments
Open

[GraphBolt] Temporal link prediction example crash #7505

mfbalin opened this issue Jul 4, 2024 · 7 comments
Assignees
Labels
Work Item Work items tracked in project tracker

Comments

@mfbalin
Copy link
Collaborator

mfbalin commented Jul 4, 2024

🐛 Bug

To Reproduce

Steps to reproduce the behavior:

  1. python temporal_link_prediction.py
Training in cpu-cuda mode.
Loading data
Downloading datasets/diginetica-r2ne.zip from https://dgl-data.s3-accelerate.amazonaws.com/dataset/diginetica-r2ne.zip...
datasets/diginetica-r2ne.zip: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 404M/404M [00:09<00:00, 40.9MB/s]
Extracting file to datasets
Start to preprocess the on-disk dataset.
/localscratch/dgl-3/python/dgl/graphbolt/impl/ondisk_dataset.py:460: DGLWarning: Edge feature is stored, but edge IDs are not saved.
  dgl_warning("Edge feature is stored, but edge IDs are not saved.")
Finish preprocessing the on-disk dataset.
Training...
0it [00:00, ?it/s]/localscratch/dgl-3/python/dgl/graphbolt/item_sampler.py:94: DGLWarning: Unknown item name 'node_pairs' is detected and added into `MiniBatch`. You probably need to provide a customized `MiniBatcher`.
  dgl_warning(
/localscratch/dgl-3/python/dgl/graphbolt/item_sampler.py:94: DGLWarning: Unknown item name 'YEAR(timestamp)' is detected and added into `MiniBatch`. You probably need to provide a customized `MiniBatcher`.
  dgl_warning(
/localscratch/dgl-3/python/dgl/graphbolt/item_sampler.py:94: DGLWarning: Unknown item name 'MONTH(timestamp)' is detected and added into `MiniBatch`. You probably need to provide a customized `MiniBatcher`.
  dgl_warning(
/localscratch/dgl-3/python/dgl/graphbolt/item_sampler.py:94: DGLWarning: Unknown item name 'DAY(timestamp)' is detected and added into `MiniBatch`. You probably need to provide a customized `MiniBatcher`.
  dgl_warning(
/localscratch/dgl-3/python/dgl/graphbolt/item_sampler.py:94: DGLWarning: Unknown item name 'DAYOFWEEK(timestamp)' is detected and added into `MiniBatch`. You probably need to provide a customized `MiniBatcher`.
  dgl_warning(
/localscratch/dgl-3/python/dgl/graphbolt/item_sampler.py:94: DGLWarning: Unknown item name 'TIMESTAMP(timestamp)' is detected and added into `MiniBatch`. You probably need to provide a customized `MiniBatcher`.
  dgl_warning(
/localscratch/dgl-3/python/dgl/graphbolt/item_sampler.py:94: DGLWarning: Unknown item name 'timestamp' is detected and added into `MiniBatch`. You probably need to provide a customized `MiniBatcher`.
  dgl_warning(
0it [00:00, ?it/s]
Traceback (most recent call last):
  File "/localscratch/dgl-3/examples/sampling/graphbolt/pyg/labor/../../temporal_link_prediction.py", line 322, in <module>
    main(args)
  File "/localscratch/dgl-3/examples/sampling/graphbolt/pyg/labor/../../temporal_link_prediction.py", line 317, in main
    train(args, model, graph, features, train_set, encoders)
  File "/localscratch/dgl-3/examples/sampling/graphbolt/pyg/labor/../../temporal_link_prediction.py", line 169, in train
    for step, data in tqdm.tqdm(enumerate(dataloader)):
  File "/usr/local/lib/python3.10/dist-packages/tqdm/std.py", line 1181, in __iter__
    for obj in iterable:
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 629, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 672, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 41, in fetch
    data = next(self.dataset_iter)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/datapipes/_hook_iterator.py", line 150, in __next__
    return self._get_next()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/datapipes/_hook_iterator.py", line 138, in _get_next
    result = next(self.iterator)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/datapipes/_hook_iterator.py", line 222, in wrap_next
    result = next_func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/datapipes/datapipe.py", line 383, in __next__
    return next(self._datapipe_iter)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/datapipes/_hook_iterator.py", line 179, in wrap_generator
    response = gen.send(None)
  File "/localscratch/dgl-3/python/dgl/graphbolt/base.py", line 287, in __iter__
    yield from self.datapipe
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/datapipes/_hook_iterator.py", line 179, in wrap_generator
    response = gen.send(None)
  File "/localscratch/dgl-3/python/dgl/graphbolt/base.py", line 274, in __iter__
    for data in self.datapipe:
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/datapipes/_hook_iterator.py", line 179, in wrap_generator
    response = gen.send(None)
  File "/usr/local/lib/python3.10/dist-packages/torchdata/datapipes/iter/util/prefetcher.py", line 103, in __iter__
    raise data
  File "/usr/local/lib/python3.10/dist-packages/torchdata/datapipes/iter/util/prefetcher.py", line 73, in thread_worker
    item = next(itr)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/datapipes/_hook_iterator.py", line 179, in wrap_generator
    response = gen.send(None)
  File "/localscratch/dgl-3/python/dgl/graphbolt/base.py", line 346, in __iter__
    for data in self.datapipe:
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/datapipes/_hook_iterator.py", line 179, in wrap_generator
    response = gen.send(None)
  File "/localscratch/dgl-3/python/dgl/graphbolt/base.py", line 313, in __iter__
    for data in self.datapipe:
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/datapipes/_hook_iterator.py", line 179, in wrap_generator
    response = gen.send(None)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/datapipes/iter/callable.py", line 124, in __iter__
    for data in self.datapipe:
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/datapipes/_hook_iterator.py", line 179, in wrap_generator
    response = gen.send(None)
  File "/localscratch/dgl-3/python/dgl/graphbolt/dataloader.py", line 95, in __iter__
    yield from self.dataloader
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 629, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 672, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 41, in fetch
    data = next(self.dataset_iter)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/datapipes/_hook_iterator.py", line 150, in __next__
    return self._get_next()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/datapipes/_hook_iterator.py", line 138, in _get_next
    result = next(self.iterator)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/datapipes/_hook_iterator.py", line 222, in wrap_next
    result = next_func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/datapipes/datapipe.py", line 383, in __next__
    return next(self._datapipe_iter)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/datapipes/_hook_iterator.py", line 179, in wrap_generator
    response = gen.send(None)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/datapipes/iter/callable.py", line 124, in __iter__
    for data in self.datapipe:
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/datapipes/_hook_iterator.py", line 179, in wrap_generator
    response = gen.send(None)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/datapipes/iter/callable.py", line 124, in __iter__
    for data in self.datapipe:
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/datapipes/_hook_iterator.py", line 179, in wrap_generator
    response = gen.send(None)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/datapipes/iter/callable.py", line 125, in __iter__
    yield self._apply_fn(data)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/datapipes/iter/callable.py", line 90, in _apply_fn
    return self.fn(data)
  File "/localscratch/dgl-3/python/dgl/graphbolt/minibatch_transformer.py", line 38, in _transformer
    minibatch = self.transformer(minibatch)
  File "/localscratch/dgl-3/python/dgl/graphbolt/subgraph_sampler.py", line 78, in _sample
    ) = self.sample_subgraphs(
  File "/localscratch/dgl-3/python/dgl/graphbolt/impl/temporal_neighbor_sampler.py", line 76, in sample_subgraphs
    subgraph = self.sampler(
  File "/localscratch/dgl-3/python/dgl/graphbolt/impl/fused_csc_sampling_graph.py", line 1106, in temporal_sample_neighbors
    self._check_sampler_arguments(nodes, fanouts, probs_or_mask)
  File "/localscratch/dgl-3/python/dgl/graphbolt/impl/fused_csc_sampling_graph.py", line 779, in _check_sampler_arguments
    assert nodes.dtype == self.indices.dtype, (
AssertionError: Data type of nodes must be consistent with indices.dtype(torch.int32), but got torch.int64.
This exception is thrown by __iter__ of MiniBatchTransformer(datapipe=MiniBatchTransformer, transformer=<bound method SubgraphSampler._sample of TemporalNeighborSampler>)

Expected behavior

Should run without crash

Environment

  • DGL Version (e.g., 1.0):
  • Backend Library & Version (e.g., PyTorch 0.4.1, MXNet/Gluon 1.3):
  • OS (e.g., Linux):
  • How you installed DGL (conda, pip, source):
  • Build command you used (if compiling from source):
  • Python version:
  • CUDA/cuDNN version (if applicable):
  • GPU models and configuration (e.g. V100):
  • Any other relevant information:

Additional context

@mfbalin
Copy link
Collaborator Author

mfbalin commented Jul 4, 2024

I think the dataset dtypes are not consistent, the nodes and the graph node dtypes do not match. @frozenbugs @Rhett-Ying

@mfbalin
Copy link
Collaborator Author

mfbalin commented Jul 5, 2024

Regression tests do not show any result for the temporal link prediction example either.

@mfbalin
Copy link
Collaborator Author

mfbalin commented Jul 5, 2024

Temporarily fixed in #7503, see the TODO:

# TODO: Fix the dataset so that this modification is not needed. node_pairs
# needs to be cast into graph.indices.dtype, which is int32.
train_set._itemsets["Query:Click:Product"]._items = tuple(
item.to(graph.indices.dtype if i == 0 else None)
for i, item in enumerate(
train_set._itemsets["Query:Click:Product"]._items
)
)

Copy link

github-actions bot commented Aug 5, 2024

This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you

@mfbalin mfbalin added Work Item Work items tracked in project tracker and removed stale-issue labels Aug 5, 2024
@mfbalin mfbalin assigned frozenbugs and unassigned peizhou001 Aug 25, 2024
@mfbalin
Copy link
Collaborator Author

mfbalin commented Aug 25, 2024

@frozenbugs feel free to assign someone else on the issue.

@frozenbugs
Copy link
Collaborator

This should have been fixed, confirmed on the latest regression result.

@mfbalin
Copy link
Collaborator Author

mfbalin commented Aug 27, 2024

Temporarily fixed in #7503, see the TODO:

# TODO: Fix the dataset so that this modification is not needed. node_pairs
# needs to be cast into graph.indices.dtype, which is int32.
train_set._itemsets["Query:Click:Product"]._items = tuple(
item.to(graph.indices.dtype if i == 0 else None)
for i, item in enumerate(
train_set._itemsets["Query:Click:Product"]._items
)
)

@frozenbugs it is temporarily fixed, see my comment above:

@mfbalin mfbalin reopened this Aug 27, 2024
@frozenbugs frozenbugs assigned BowenYao18 and unassigned frozenbugs Aug 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Work Item Work items tracked in project tracker
Projects
None yet
Development

No branches or pull requests

4 participants