Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix] Handle a Corner Case of Batching after Removing Nodes/Edges #2465

Merged
merged 14 commits into from
Jan 5, 2021
Merged

[Bugfix] Handle a Corner Case of Batching after Removing Nodes/Edges #2465

merged 14 commits into from
Jan 5, 2021

Conversation

mufeili
Copy link
Member

@mufeili mufeili commented Dec 29, 2020

Description

Fix #2409

@zrqiao With this PR, remove_nodes/remove_edges no longer store the original node/edge IDs. See if this is good to you.

Checklist

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented
  • To the my best knowledge, examples are either not affected by this change,
    or have been fixed to be compatible with this change
  • Related issue is referred in this PR

@mufeili
Copy link
Member Author

mufeili commented Dec 29, 2020

@BarclayII Any idea for the error reported in CI regarding neighbor sampling?

store_raw_ids : bool, optional
If True, it will store the raw IDs of the extracted nodes and edges in the ``ndata``
and ``edata`` of the resulting graph under name ``dgl.NID`` and ``dgl.EID``,
respectively.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

store_raw_ids or store_ids?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@BarclayII
Copy link
Collaborator

BarclayII commented Jan 2, 2021

The code that threw error is:

# Removing edges from the frontier for link prediction training falls
# into the category of frontier postprocessing
if exclude_eids is not None:
    parent_eids = frontier.edata[EID]
    parent_eids_np = _tensor_or_dict_to_numpy(parent_eids)
    located_eids = _locate_eids_to_exclude(parent_eids_np, exclude_eids)
    if not isinstance(located_eids, Mapping):
        # (BarclayII) If frontier already has a EID field and located_eids is empty,
        # the returned graph will keep EID intact.  Otherwise, EID will change
        # to the mapping from the new graph to the old frontier.
        # So we need to test if located_eids is empty, and do the remapping ourselves.
        if len(located_eids) > 0:
            frontier = transform.remove_edges(frontier, located_eids)
            frontier.edata[EID] = F.gather_row(parent_eids, frontier.edata[EID])

What the code does is to remove the edges in frontier that has its EID in the given exclude_eids.

If remove_edges no longer returns the edge ID of the parent graph, how should I know the edge correspondence between the original graph and the new graph?

Copy link

@zrqiao zrqiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The changes look good to me, and I also suggest add docs to dgl.batch() to clarify the behaviors when perform batching on subgraphs.

  • Consider adding docs to dgl.batch()

@zrqiao
Copy link

zrqiao commented Jan 2, 2021

@mufeili Regarding that check error - I am not familiar with your CI workflow, but do we need the original NID/EIDs to be accessible from subgraphs in other functions? Perhaps the store_raw_ids should be True by default.

@mufeili
Copy link
Member Author

mufeili commented Jan 4, 2021

@BarclayII I've made the update based on our discussion. Can you take a look?

@mufeili
Copy link
Member Author

mufeili commented Jan 4, 2021

@zrqiao I've chatted with @BarclayII regarding the issue and we decided to add a flag to remove_nodes/remove_edges for storing raw ids and set it to False by default. Internally it's easy to set the flag to True when necessary.

Regarding the doc, I think it might be better to add doc for remove_nodes/remove_edges. With the flag default to False, do you still prefer a note in the doc?

@zrqiao
Copy link

zrqiao commented Jan 4, 2021

@mufeili Thanks! I would also prefer the flag default to False, as long as that does not break other utilities. Approved.

@BarclayII
Copy link
Collaborator

I'm good.

@mufeili mufeili merged commit 0c2a2ea into dmlc:master Jan 5, 2021
@mufeili mufeili deleted the remove_batch branch January 5, 2021 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Batching fails when remove_edges is called on part of the graphs
4 participants