Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Utils] Edge and LINKX homophily measure #5382

Merged
merged 29 commits into from
Mar 3, 2023
Merged

Conversation

mufeili
Copy link
Member

@mufeili mufeili commented Feb 24, 2023

Description

Checklist

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
  • I've leverage the tools to beautify the python and c++ code.
  • The PR is complete and small, read the Google eng practice (CL equals to PR) to understand more about small PR. In DGL, we consider PRs with less than 200 lines of core code change are small (example, test and documentation could be exempted).
  • All changes have test coverage
  • Code is well-documented
  • To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
  • Related issue is referred in this PR
  • If the PR is for a new model/paper, I've updated the example index here.

Changes

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 24, 2023

To trigger regression tests:

  • @dgl-bot run [instance-type] [which tests] [compare-with-branch];
    For example: @dgl-bot run g4dn.4xlarge all dmlc/master or @dgl-bot run c5.9xlarge kernel,api dmlc/master

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 24, 2023

Commit ID: 35e4d87

Build ID: 1

Status: ❌ CI test failed in Stage [Tensorflow GPU Unit test].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 24, 2023

Commit ID: a4a2f46

Build ID: 2

Status: ❌ CI test failed in Stage [Torch GPU Unit test].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 24, 2023

Commit ID: 1cdf545

Build ID: 3

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

python/dgl/homophily.py Outdated Show resolved Hide resolved
python/dgl/homophily.py Outdated Show resolved Hide resolved
@mufeili mufeili changed the title [Utils] Edge and LINKX homophily measure [DoNotMerge] [Utils] Edge and LINKX homophily measure Feb 27, 2023
@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 27, 2023

Commit ID: afd11e76b4d8064ba514a49496b359ac0c302654

Build ID: 4

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 28, 2023

Commit ID: 899344ce39bf545bc451d6fa39c1a712267642af

Build ID: 7

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

python/dgl/homophily.py Outdated Show resolved Hide resolved
)
return graph.ndata["node_value"].mean().item()
return F.as_scalar(F.mean(graph.ndata["same_class_deg"], dim=0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to point out that the implementation is awkward due to the constraints of our current APIs: (1) Need to use framework-agnostic backend, (2) don't support integer-type aggregation, etc.

Ideally, it should be as simple as:

u, v = graph.edges()
graph.edata['same_class'] = (y[u.long()] == y[v.long()]).float()
graph.update_all(...)
return graph.ndata["same_class_deg"].mean()

python/dgl/homophily.py Outdated Show resolved Hide resolved
python/dgl/homophily.py Outdated Show resolved Hide resolved
tests/python/common/test_homophily.py Outdated Show resolved Hide resolved
python/dgl/homophily.py Outdated Show resolved Hide resolved
python/dgl/homophily.py Outdated Show resolved Hide resolved
@dgl-bot
Copy link
Collaborator

dgl-bot commented Mar 2, 2023

Commit ID: 2058918

Build ID: 8

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Mar 2, 2023

Commit ID: 2058918

Build ID: 9

Status: ❌ CI test failed in Stage [Lint Check].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Mar 2, 2023

Commit ID: 83b6677

Build ID: 10

Status: ❌ CI test failed in Stage [Lint Check].

Report path: link

Full logs path: link

__all__ = ["node_homophily", "edge_homophily", "linkx_homophily"]


def get_long_edges(graph):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sound good.

nit: Maybe rename to get_edges_long, more natural.

----------
graph : DGLGraph
The graph.
y : Tensor
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

torch.Tensor

and others.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


for k in range(num_classes):
# Get the nodes that belong to class k.
class_mask = y == k
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: class_mask = (y == k)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially did what you suggested, and then the lint check failed.

dgl.backend.backend_name != "pytorch", reason="Only support PyTorch for now"
)
@parametrize_idtype
def test_linkx_homophily(idtype):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any conner case you need to handle?
e.g. there was a max(0, xxxx)
Should we check the 0 cases?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current cases are sufficient.

def get_long_edges(graph):
"""Internal function for getting the edges of a graph as long tensors."""
src, dst = graph.edges()
return src.long(), dst.long()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are only two lines, consider just embed them.

Copy link
Member Author

@mufeili mufeili Mar 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine either way. Maybe you two can start a fight. :) @frozenbugs

graph.edata["same_class"] = (y[src] == y[dst]).float()
graph.update_all(
fn.copy_e("same_class", "m"), fn.sum("m", "same_class_deg")
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, now I'm pushing this further. Will using sparse API makes the code more readable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How so? You convert the graph to a sparse matrix and call AX. I don't think there are significant differences.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with graph.local_scope():
    # Handle the case where graph is of dtype int32.
    src, dst = get_long_edges(graph)
    # Compute y_v = y_u for all edges.
    graph.edata["same_class"] = (y[src] == y[dst]).float()
    graph.update_all(
        fn.copy_e("same_class", "m"), fn.mean("m", "same_class_deg")
    )
    return graph.ndata["same_class_deg"].mean(dim=0).item()

v.s.

A = graph.adj
same_class = (y[A.row] == y[A.col]).float()
same_class_avg = dglsp.val_like(A, same_class).smean(dim=1)
return same_class_avg.mean(dim=0).item()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v.s. in the new message passing API style

src, dst = get_long_edges(graph)
same_class = (y[src] == y[dst]).float()
same_class_avg = dgl.mpops.copy_e_mean(g, same_class)
return same_class_avg.mean(dim=0).item()

Copy link
Member Author

@mufeili mufeili Mar 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, it's quite subtle. I'm fine either way. The question is more about when do we encourage the use of message passing APIs versus sparse APIs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My opinion is to go with the math formulation: If the model is described in node-wise/edge-wise computation then message passing is the way to goal; otherwise, use sparse. In this case, the definition is in node/edge so message passing is more suitable. You can see that although the sparse APIs are shorter, it doesn't align well with the definition, e.g., the use of val_like and smean is not straightforward.

python/dgl/homophily.py Outdated Show resolved Hide resolved
python/dgl/homophily.py Outdated Show resolved Hide resolved
dgl.backend.backend_name != "pytorch", reason="Only support PyTorch for now"
)
@parametrize_idtype
def test_linkx_homophily(idtype):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

tests/python/common/test_homophily.py Show resolved Hide resolved
@dgl-bot
Copy link
Collaborator

dgl-bot commented Mar 3, 2023

Commit ID: 45116b9f96057f4dacfc7a612497bbe78d4969e8

Build ID: 11

Status: ❌ CI test failed in Stage [Lint Check].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Mar 3, 2023

Commit ID: a2ecacdea5e48576edf01bb459160ffa851869da

Build ID: 12

Status: ❌ CI test failed in Stage [Lint Check].

Report path: link

Full logs path: link

Copy link
Member

@jermainewang jermainewang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Do you want to remove the [DoNotMerge] tag?

@dgl-bot
Copy link
Collaborator

dgl-bot commented Mar 3, 2023

Commit ID: 989fd86810e476dd7e01e26fa9923716972d4ac8

Build ID: 13

Status: ❌ CI test failed in Stage [Torch CPU (Win64) Unit test].

Report path: link

Full logs path: link

@mufeili mufeili changed the title [DoNotMerge] [Utils] Edge and LINKX homophily measure [Utils] Edge and LINKX homophily measure Mar 3, 2023
@dgl-bot
Copy link
Collaborator

dgl-bot commented Mar 3, 2023

Commit ID: 31fabd94d5b10d7f07a4d1714a385ca5125b60a4

Build ID: 14

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Mar 3, 2023

Commit ID: 8714d4a

Build ID: 15

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

@mufeili mufeili merged commit f00cd6e into dmlc:master Mar 3, 2023
@mufeili mufeili deleted the homophily branch March 3, 2023 08:58
DominikaJedynak pushed a commit to DominikaJedynak/dgl that referenced this pull request Mar 12, 2024
* Update

* lint

* lint

* r prefix

* CI

* lint

* skip TF

* Update

* edge homophily

* linkx homophily

* format

* skip TF

* fix test

* update

* lint

* lint

* review

* lint

* update

* lint

* update

* CI

---------

Co-authored-by: Ubuntu <ubuntu@ip-172-31-36-188.ap-northeast-1.compute.internal>
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants