The dgl.data.TUDataset class returns labels in {0,2} for some binary classes. Can we instead return {0, 1}? #2165

henrykenlay · 2020-09-09T11:02:21Z

🚀 Feature

The graph class labels returned by some binary TUDatasets are {0, 2} (such as dgl.data.TUDataset('BZR'). Some datasets are {0, 1} such as dgl.data.TUDataset('MCF-7'). It would be good if this behaviour was more consistent.

Motivation

By having consistent behaviour users wouldn't have to wrap the class in a preprocessing layer to make sure the labels are consistent with standard conventions (such as a model using a logistic activation function for binary prediction models with binary cross-entropy loss).

I did notice that in the notes section of the docs it reads

Graphs may have node labels, node attributes, edge labels, and edge attributes, varing from different dataset. This class does not perform additional process.

However, this isn't actually the case since the raw BZR graph labels are {-1, 1}, the labels are preprocessed by adding the minimum label to all labels.

Alternatives

Do not modify the graph labels at all, as per the docs.

Pitch

Preprocessing the labels so that the labels are {0, ..., n-1} where n is the number of classes would be the easiest for the user. An additional argument could be added which allows the user to access the raw labels if needed.

Additional context

I can put in a pull request if this change seems reasonable.

The text was updated successfully, but these errors were encountered:

classicsong · 2020-09-10T06:29:49Z

hi, HenryKenlay:
Can you help provide this feature? Currently, we just let the label id start from 0:

dgl/python/dgl/data/tu.py

Lines 320 to 330 in b10b541

    
           for filename, field_name in self.attr_dict.items(): 
        
               try: 
        
                   data = loadtxt(self._file_path(filename), 
        
                                  delimiter=',').astype(int) 
        
                   if 'label' in filename: 
        
                       data = F.tensor(self._idx_from_zero(data)) 
        
                   else: 
        
                       data = F.tensor(data) 
        
                   getattr(g, field_name[0])[field_name[1]] = data 
        
               except IOError: 
        
                   pass

Your help is really appreciated.

* [Bugfix] fix TUDataset labelling issue (#2165) * [Bugfix] fix TUDataset labelling issue (#2165) * update docstring according to discussion Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com> Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>

BarclayII · 2020-09-11T11:43:54Z

Fixed in #2173

* [Bugfix] fix TUDataset labelling issue (dmlc#2165) * [Bugfix] fix TUDataset labelling issue (dmlc#2165) * update docstring according to discussion Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com> Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>

henrykenlay added a commit to henrykenlay/dgl that referenced this issue Sep 10, 2020

[Bugfix] fix TUDataset labelling issue (dmlc#2165)

0db85f0

henrykenlay added a commit to henrykenlay/dgl that referenced this issue Sep 10, 2020

[Bugfix] fix TUDataset labelling issue (dmlc#2165)

69b9240

henrykenlay mentioned this issue Sep 10, 2020

[Bugfix] fix TUDataset labelling issue (#2165) #2173

Merged

6 tasks

BarclayII mentioned this issue Sep 11, 2020

[Patch Release] 0.5.2 #2178

Closed

BarclayII closed this as completed Sep 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The dgl.data.TUDataset class returns labels in {0,2} for some binary classes. Can we instead return {0, 1}? #2165

The dgl.data.TUDataset class returns labels in {0,2} for some binary classes. Can we instead return {0, 1}? #2165

henrykenlay commented Sep 9, 2020

classicsong commented Sep 10, 2020 •

edited

Loading

BarclayII commented Sep 11, 2020

The dgl.data.TUDataset class returns labels in {0,2} for some binary classes. Can we instead return {0, 1}? #2165

The dgl.data.TUDataset class returns labels in {0,2} for some binary classes. Can we instead return {0, 1}? #2165

Comments

henrykenlay commented Sep 9, 2020

🚀 Feature

Motivation

Alternatives

Pitch

Additional context

classicsong commented Sep 10, 2020 • edited Loading

BarclayII commented Sep 11, 2020

classicsong commented Sep 10, 2020 •

edited

Loading