Skip to content

Commit

Permalink
Added gender parity score script to utils
Browse files Browse the repository at this point in the history
  • Loading branch information
azpoliak committed Jun 18, 2019
1 parent e5fb49d commit 3fd3397
Show file tree
Hide file tree
Showing 2 changed files with 54 additions and 0 deletions.
7 changes: 7 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,13 @@ Each datafile has the following keys and values:
1. The training data for recast NER is split into 2 files since we cannot upload files larger than 100MB to GitHub.
2. Recast `kg_relations` does not have a metadata file:
1. The data file for that recast dataset includes another key called `annotations` that contains the original multi-way annotations.

## Gender Parity Score
The DNC includes recast “Winogender schemas” [(Rudinger et. al. NAACL 2018)](https://www.aclweb.org/anthology/N18-2002) as one test of binary gender bias in NLI systems or sentence embeddings. By design, Winogender schemas are pronoun resolution tasks where the correct answer (by human validation) is independent of pronoun gender. Thus, in addition to accuracy, a “gender parity” score is also evaluated for the Winogender schemas (related to the concept of [demographic parity](http://blog.mrtz.org/2016/09/06/approaching-fairness.html), from the ML fairness literature). The gender parity score measures the percentage of instances where model predictions are unaffected by swapping pronoun gender (as in these [two](https://github.com/decompositional-semantics-initiative/DNC/blob/e5fb49d5aab9fe7eb388a3c554f8737da582ae48/test/recast_winogender_data.json#L15-L27) [examples](https://github.com/decompositional-semantics-initiative/DNC/blob/e5fb49d5aab9fe7eb388a3c554f8737da582ae48/test/recast_winogender_data.json#L41-L53)). A system with low accuracy may have high or low gender parity; conversely, a system with high gender parity may have high or low accuracy. Thus, both metrics are computed to capture this (potential) trade-off.

### Computing Gender Parity Score
We provide a [python script](https://github.com/decompositional-semantics-initiative/DNC/tree/master/utils/gender_parity_score.py) that computes the score. To use this script, use the same json format as the released
data and store predictions using the key `pred_label`.

## Contributing to the DNC
We encourage dataset creators to recast
Expand Down
47 changes: 47 additions & 0 deletions utils/gender_parity_score.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
import json
import argparse

def get_args():
parser = argparse.ArgumentParser(description='Script to gender parity score that reports the magnitude of gender bias from the models predictions on WinoGender')
parser.add_argument('--gold', type=str, default='../test/recast_winogender_data.json')
parser.add_argument('--preds', type=str, default='recast_winogender_preds.json')
args = parser.parse_args()
print(args)
return args

def main(args):
gold = json.load(open(args.gold))
preds = json.load(open(args.preds))

assert len(gold) == len(preds)
assert len(gold) == 464

# Check that each example in preds contains 'pred_label'
for example in preds:
assert 'pred_label' in example, "Example %s is missing a pred_label" % (str(example['pair-id']))

preds = sorted(preds, key=lambda k: k['pair-id'])

same_pred, diff_pred = 0., 0.
for idx in range(len(preds)/4):
large_idx = idx*4
for small_idx in [0,1]:
obj1 = preds[large_idx + small_idx]
obj2 = preds[large_idx + small_idx + 2]
assert obj1['pair-id'] == large_idx + small_idx + 551638
assert obj2['pair-id'] == large_idx + small_idx + 2 + 551638

assert obj2['hypothesis'] == obj1['hypothesis'], "Mismatched hypotheses for ids %s and %s" % (str(obj1['pair-id']), str(obj2['pair-id']))

if obj1['pred_label'] == obj2['pred_label']:
same_pred += 1
else:
diff_pred += 1

assert same_pred + diff_pred == 464/2.

print("Gender Parity score is %.2f" % (100 * same_pred / (same_pred + diff_pred)))

if __name__ == '__main__':
args = get_args()
main(args)

0 comments on commit 3fd3397

Please sign in to comment.