-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] Handle a Corner Case of Batching after Removing Nodes/Edges #2465
Conversation
@BarclayII Any idea for the error reported in CI regarding neighbor sampling? |
store_raw_ids : bool, optional | ||
If True, it will store the raw IDs of the extracted nodes and edges in the ``ndata`` | ||
and ``edata`` of the resulting graph under name ``dgl.NID`` and ``dgl.EID``, | ||
respectively. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
store_raw_ids
or store_ids
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
The code that threw error is: # Removing edges from the frontier for link prediction training falls
# into the category of frontier postprocessing
if exclude_eids is not None:
parent_eids = frontier.edata[EID]
parent_eids_np = _tensor_or_dict_to_numpy(parent_eids)
located_eids = _locate_eids_to_exclude(parent_eids_np, exclude_eids)
if not isinstance(located_eids, Mapping):
# (BarclayII) If frontier already has a EID field and located_eids is empty,
# the returned graph will keep EID intact. Otherwise, EID will change
# to the mapping from the new graph to the old frontier.
# So we need to test if located_eids is empty, and do the remapping ourselves.
if len(located_eids) > 0:
frontier = transform.remove_edges(frontier, located_eids)
frontier.edata[EID] = F.gather_row(parent_eids, frontier.edata[EID]) What the code does is to remove the edges in If |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! The changes look good to me, and I also suggest add docs to dgl.batch()
to clarify the behaviors when perform batching on subgraphs.
- Consider adding docs to
dgl.batch()
@mufeili Regarding that check error - I am not familiar with your CI workflow, but do we need the original NID/EIDs to be accessible from subgraphs in other functions? Perhaps the |
@BarclayII I've made the update based on our discussion. Can you take a look? |
@zrqiao I've chatted with @BarclayII regarding the issue and we decided to add a flag to Regarding the doc, I think it might be better to add doc for |
@mufeili Thanks! I would also prefer the flag default to False, as long as that does not break other utilities. Approved. |
I'm good. |
Description
Fix #2409
@zrqiao With this PR,
remove_nodes
/remove_edges
no longer store the original node/edge IDs. See if this is good to you.Checklist
Please feel free to remove inapplicable items for your PR.
or have been fixed to be compatible with this change