Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Fixes issues in feature pickling and transmission for subgraphs and blocks #2139

Merged
merged 13 commits into from
Sep 3, 2020

Conversation

BarclayII
Copy link
Collaborator

Description

Fixes #2135 and #2137 by allowing the storage of Column to be None.

For subprocesses in PyTorch node and edge DataLoaders, we first check if the feature comes from the original graph by checking whether the column storage of the subgraph/block is identical to that of the original graph. If so, we set the storage to be None.

After the subprocesses transmit the subgraphs/blocks, we then check in the main process if the column storage is None. If so, we restore the storage to be that with the same name in the original graph.

def collate(self, items):
input_nodes, output_nodes, blocks = super().collate(items)
_pop_blocks_storage(blocks, self.g)
return input_nodes, output_nodes, blocks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not directly having _pop_blocks_storage in NodeCollator?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if you write your own iterator that gets the blocks with NodeCollator (like in Tensorflow)? The features of the blocks will be removed in this case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we can put them in the node collator and have an argument pop_storage which defaults to False.

_pop_subgraph_storage(pair_graph, self.g)
_pop_subgraph_storage(neg_pair_graph, self.g)
_pop_blocks_storage(blocks, self.g_sampling)
return input_nodes, pair_graph, neg_pair_graph, blocks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here. I feel the logic can go to EdgeCollator directly.

for key, col in subframe._columns.items():
if col.storage is None:
assert key in frame._columns
col.storage = frame._columns[key].storage
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is only valid when a column's storage can never be None. Maybe add one sentence in Column's docstring to emphasize that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel the assertion at L63 is redundant, gives no extra information and incurs an extra key lookup. I'd suggest remove it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left one paragraph in the docstring of Column. Does that suffice?

@BarclayII BarclayII merged commit ac5c79c into dmlc:master Sep 3, 2020
@BarclayII BarclayII deleted the fix-2135-2137 branch September 3, 2020 04:49
kingmbc pushed a commit to kingmbc/dgl that referenced this pull request Sep 4, 2020
… and blocks (dmlc#2139)

* bug fixes

* remove __deepcopy__; not sure what the behavior should be

* lint

* skip gpu test

* fix

* fix dist dataloader

* add comment

* remove assert
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants