add model metrics table when only validating the 10k most stable pred…

…ictions for each model
janosh · Jun 20, 2023 · c4ca186 · c4ca186
1 parent 3ac72e6
commit c4ca186
Show file tree

Hide file tree

Showing 28 changed files with 715 additions and 362 deletions.
diff --git a/matbench_discovery/metrics.py b/matbench_discovery/metrics.py
@@ -101,8 +101,9 @@ def stable_metrics(
         DAF=precision / prevalence,
         Precision=precision,
         Recall=recall,
-        **dict(TPR=TPR, FPR=FPR, TNR=TNR, FNR=FNR),
         Accuracy=(n_true_pos + n_true_neg) / len(each_true),
+        **dict(TPR=TPR, FPR=FPR, TNR=TNR, FNR=FNR),
+        **dict(TP=n_true_pos, FP=n_false_pos, TN=n_true_neg, FN=n_false_neg),
         MAE=np.abs(each_true - each_pred).mean(),
         RMSE=((each_true - each_pred) ** 2).mean() ** 0.5,
         R2=r2_score(each_true, each_pred),

diff --git a/matbench_discovery/plots.py b/matbench_discovery/plots.py
@@ -703,7 +703,7 @@ def cumulative_precision_recall(
             df = dfs[metric]
             ax.set(ylim=(0, 1), xlim=(0, None), ylabel=metric)
             for model in df_preds:
-                # TODO is this if really necessary?
+                # TODO is this really necessary?
                 if len(df[model].dropna()) == 0:
                     continue
                 x_end = df[model].dropna().index[-1]

diff --git a/matbench_discovery/preds.py b/matbench_discovery/preds.py
@@ -7,7 +7,7 @@
 from tqdm import tqdm
 
 from matbench_discovery import ROOT
-from matbench_discovery.data import Files, glob_to_df
+from matbench_discovery.data import Files, df_wbm, glob_to_df
 from matbench_discovery.metrics import stable_metrics
 from matbench_discovery.plots import eVpa, model_labels, quantity_labels
 
@@ -131,13 +131,24 @@ def load_df_wbm_with_preds(
 
 
 df_metrics = pd.DataFrame()
+df_metrics_10k = pd.DataFrame()  # look only at each model's 10k most stable predictions
+prevalence = (df_wbm[each_true_col] <= 0).mean()
+
 df_metrics.index.name = "model"
 for model in PRED_FILES:
     each_pred = df_preds[each_true_col] + df_preds[model] - df_preds[e_form_col]
     df_metrics[model] = stable_metrics(df_preds[each_true_col], each_pred)
+    most_stable_10k = each_pred.nsmallest(10_000)
+    df_metrics_10k[model] = stable_metrics(
+        df_preds[each_true_col].loc[most_stable_10k.index], most_stable_10k
+    )
+    df_metrics_10k[model]["DAF"] = df_metrics_10k[model]["Precision"] / prevalence
+
 
 # pick F1 as primary metric to sort by
 df_metrics = df_metrics.round(3).sort_values("F1", axis=1, ascending=False)
+df_metrics_10k = df_metrics_10k.round(3).sort_values("F1", axis=1, ascending=False)
+
 
 # dataframe of all models' energy above convex hull (EACH) predictions (eV/atom)
 df_each_pred = pd.DataFrame()

diff --git a/models/bowsr/metadata.yml b/models/bowsr/metadata.yml
@@ -25,7 +25,7 @@ requirements:
   megnet: 1.3.2
   numpy: 1.24.0
   pandas: 1.5.1
-trained_on_benchmark: false
+trained_for_benchmark: false
 
 hyperparams:
   Optimizer Params:

diff --git a/models/cgcnn/metadata.yml b/models/cgcnn/metadata.yml
@@ -20,7 +20,7 @@
     torch-scatter: 2.0.9
     numpy: 1.24.0
     pandas: 1.5.1
-  trained_on_benchmark: true
+  trained_for_benchmark: true
 
   hyperparams:
     Ensemble Size: 10
@@ -57,7 +57,7 @@
     torch-scatter: 2.0.9
     numpy: 1.24.0
     pandas: 1.5.1
-  trained_on_benchmark: true
+  trained_for_benchmark: true
 
   hyperparams:
     Ensemble Size: 10

diff --git a/models/chgnet/metadata.yml b/models/chgnet/metadata.yml
@@ -31,7 +31,8 @@ requirements:
   ase: 3.22.0
   pymatgen: 2022.10.22
   numpy: 1.24.0
-trained_on_benchmark: false
+trained_for_benchmark: false
+# training_set: MPTraj
 
 hyperparams:
   max_steps: 2000

diff --git a/models/m3gnet/metadata.yml b/models/m3gnet/metadata.yml
@@ -1,66 +1,29 @@
-- model_name: M3GNet
-  model_version: 2022.9.20
-  matbench_discovery_version: 1.0
-  date_added: "2022-09-20"
-  date_published: "2022-02-05"
-  authors:
-    - name: Chi Chen
-      affiliation: UC San Diego
-      role: Model
-      orcid: https://orcid.org/0000-0001-8008-7043
-    - name: Shyue Ping Ong
-      affiliation: UC San Diego
-      orcid: https://orcid.org/0000-0001-5726-2587
-      email: ongsp@ucsd.edu
-  repo: https://github.com/materialsvirtuallab/m3gnet
-  url: https://materialsvirtuallab.github.io/m3gnet
-  doi: https://doi.org/10.1038/s43588-022-00349-3
-  preprint: https://arxiv.org/abs/2202.02450
-  requirements:
-    m3gnet: 0.1.0
-    pymatgen: 2022.10.22
-    numpy: 1.24.0
-    pandas: 1.5.1
-  trained_on_benchmark: false
-  notes:
-    description: M3GNet is a GNN-based universal (as in full periodic table) interatomic potential for materials trained on up to 3-body interactions in the initial, middle and final frame of MP DFT relaxations.
-    long: It thereby learns to emulate structure relaxation, MD simulations and property prediction of materials across diverse chemical spaces.
-    training: Using pre-trained model released with paper. Was only trained on a subset of 62,783 MP relaxation trajectories in the 2018 database release (see [related issue](https://github.com/materialsvirtuallab/m3gnet/issues/20#issuecomment-1207087219)).
-
-- model_name: M3GNet + MEGNet
-  model_version: 2022.9.20
-  matbench_discovery_version: 1.0
-  date_added: "2023-02-03"
-  date_published: "2022-02-05"
-  authors:
-    - name: Chi Chen
-      affiliation: UC San Diego
-      role: Model
-      orcid: https://orcid.org/0000-0001-8008-7043
-    - name: Weike Ye
-      affiliation: UC San Diego
-      orcid: https://orcid.org/0000-0002-9541-7006
-    - name: Yunxing Zuo
-      affiliation: UC San Diego
-      orcid: https://orcid.org/0000-0002-2734-7720
-    - name: Chen Zheng
-      affiliation: UC San Diego
-      orcid: https://orcid.org/0000-0002-2344-5892
-    - name: Shyue Ping Ong
-      affiliation: UC San Diego
-      orcid: https://orcid.org/0000-0001-5726-2587
-      email: ongsp@ucsd.edu
-  repo: https://github.com/materialsvirtuallab/m3gnet
-  url: https://materialsvirtuallab.github.io/m3gnet
-  doi: https://doi.org/10.1038/s43588-022-00349-3
-  preprint: https://arxiv.org/abs/2202.02450
-  requirements:
-    m3gnet: 0.1.0
-    megnet: 1.3.2
-    pymatgen: 2022.10.22
-    numpy: 1.24.0
-    pandas: 1.5.1
-  trained_on_benchmark: false
-  notes:
-    description: This combination of models uses M3GNet to relax initial structures and then passes it to MEGNet to predict the formation energy.
-    training: Using pre-trained model released with paper. Was only trained on a subset of 62,783 MP relaxation trajectories in the 2018 database release (see [related issue](https://github.com/materialsvirtuallab/m3gnet/issues/20#issuecomment-1207087219)).
+model_name: M3GNet
+model_version: 2022.9.20
+matbench_discovery_version: 1.0
+date_added: "2022-09-20"
+date_published: "2022-02-05"
+authors:
+  - name: Chi Chen
+    affiliation: UC San Diego
+    role: Model
+    orcid: https://orcid.org/0000-0001-8008-7043
+  - name: Shyue Ping Ong
+    affiliation: UC San Diego
+    orcid: https://orcid.org/0000-0001-5726-2587
+    email: ongsp@ucsd.edu
+repo: https://github.com/materialsvirtuallab/m3gnet
+url: https://materialsvirtuallab.github.io/m3gnet
+doi: https://doi.org/10.1038/s43588-022-00349-3
+preprint: https://arxiv.org/abs/2202.02450
+requirements:
+  m3gnet: 0.1.0
+  pymatgen: 2022.10.22
+  numpy: 1.24.0
+  pandas: 1.5.1
+trained_for_benchmark: false
+notes:
+  description: M3GNet is a GNN-based universal (as in full periodic table) interatomic potential for materials trained on up to 3-body interactions in the initial, middle and final frame of MP DFT relaxations.
+  long: It thereby learns to emulate structure relaxation, MD simulations and property prediction of materials across diverse chemical spaces.
+  training: Using pre-trained model released with paper. Was only trained on a subset of 62,783 MP relaxation trajectories in the 2018 database release (see [related issue](https://github.com/materialsvirtuallab/m3gnet/issues/20#issuecomment-1207087219)).
+  testing: We also tried combining M3GNet with MEGNet where M3GNet is used to relax initial structures which are then passed to MEGNet to predict the formation energy.
diff --git a/models/megnet/metadata.yml b/models/megnet/metadata.yml
@@ -29,7 +29,7 @@ requirements:
   pymatgen: 2022.10.22
   numpy: 1.24.0
   pandas: 1.5.1
-trained_on_benchmark: false
+trained_for_benchmark: false
 
 notes:
   description: MatErials Graph Network is another GNN for material properties of relaxed structure which showed that learned element embeddings encode periodic chemical trends and can be transfer-learned from large data sets (formation energies) to predictions on small data properties (band gaps, elastic moduli).

diff --git a/models/voronoi/metadata.yml b/models/voronoi/metadata.yml
@@ -21,7 +21,7 @@ requirements:
   pymatgen: 2022.10.22
   numpy: 1.24.0
   pandas: 1.5.1
-trained_on_benchmark: true
+trained_for_benchmark: true
 
 notes:
   description: A random forest trained to map the combo of composition-based Magpie features and structure-based relaxation-invariant Voronoi tessellation features (bond angles, coordination numbers, ...) to DFT formation energies.

diff --git a/models/wrenformer/metadata.yml b/models/wrenformer/metadata.yml
@@ -25,7 +25,7 @@ requirements:
   pymatgen: 2022.10.22
   numpy: 1.24.0
   pandas: 1.5.1
-trained_on_benchmark: true
+trained_for_benchmark: true
 
 hyperparams:
   Ensemble Size: 10

diff --git a/paper b/paper
diff --git a/readme.md b/readme.md
@@ -1,6 +1,6 @@
-<h1 align="center" style="display: grid;">
-<img src="https://raw.githubusercontent.com/janosh/matbench-discovery/main/site/static/favicon.svg" alt="Logo" width="80px">
-Matbench Discovery
+<h1 align="center">
+  <img src="https://github.com/janosh/matbench-discovery/raw/main/site/static/favicon.svg" alt="Logo" width="60px"><br>
+  Matbench Discovery
 </h1>
 
 <h4 align="center" class="toc-exclude">
@@ -17,12 +17,12 @@ Matbench Discovery
 
 Matbench Discovery is an [interactive leaderboard](https://janosh.github.io/matbench-discovery) and associated [PyPI package](https://pypi.org/project/matbench-discovery) which together make it easy to benchmark ML energy models on a task designed to closely simulate a high-throughput discovery campaign for new stable inorganic crystals.
 
-In version 1 of this benchmark, we explore 8 models covering multiple methodologies ranging from random forests to graph neural networks, from one-shot predictors to iterative Bayesian optimizers and interatomic potential-based relaxers. We find [CHGNet](https://github.com/CederGroupHub/chgnet) ([paper](https://doi.org/10.48550/arXiv.2302.14231)) to achieve the highest F1 score of 0.59, $R^2$ of 0.61 and a discovery acceleration factor (DAF) of 3.06 (meaning a 3x higher rate of stable structures compared to dummy selection in our already enriched search space). See the [**full results**](https://janosh.github.io/matbench-discovery/preprint#results) in our interactive dashboard which provides valuable insights for maintainers of large-scale materials databases. We show these models have become powerful enough to warrant deploying them as triaging steps to more effectively allocate compute in high-throughput DFT relaxations.
+So far, we've tested 8 models covering multiple methodologies ranging from random forests with structure fingerprints to graph neural networks, from one-shot predictors to iterative Bayesian optimizers and interatomic potential-based relaxers. We find [CHGNet](https://github.com/CederGroupHub/chgnet) ([paper](https://doi.org/10.48550/arXiv.2302.14231)) to achieve the highest F1 score of 0.59, $R^2$ of 0.61 and a discovery acceleration factor (DAF) of 3.06 (meaning a 3x higher rate of stable structures compared to dummy selection in our already enriched search space). We believe our results show that ML models have become robust enough to deploy them as triaging steps to more effectively allocate compute in high-throughput DFT relaxations. This work provides valuable insights for anyone looking to build large-scale materials databases.
 
 <slot name="metrics-table" />
 
 We welcome contributions that add new models to the leaderboard through [GitHub PRs](https://github.com/janosh/matbench-discovery/pulls). See the [usage and contributing guide](https://janosh.github.io/matbench-discovery/contribute) for details.
 
-For a version 2 release of this benchmark, we plan to merge the current training and test sets into the new training set and acquire a much larger test set (potentially at meta-GGA level of theory) compared to the v1 test set of 257k structures. Anyone interested in joining this effort please [open a GitHub discussion](https://github.com/janosh/matbench-discovery/discussions) or [reach out privately](mailto:janosh@lbl.gov?subject=Matbench%20Discovery).
+Anyone interested in joining this effort please [open a GitHub discussion](https://github.com/janosh/matbench-discovery/discussions) or [reach out privately](mailto:janosh@lbl.gov?subject=Matbench%20Discovery).
 
 For detailed results and analysis, check out the [preprint](https://janosh.github.io/matbench-discovery/preprint) and [supplementary material](https://janosh.github.io/matbench-discovery/si).