[BUG] EnvMatStat fails when two descriptors have the same hash #4151

iProzd · 2024-09-20T11:18:07Z

Bug summary

When computing the data stat, if two descriptors have the same hash (see get_hash below, e.g. repformer and repinit_tebd), the latter one will choose to load the computed stats.

    def get_hash(self) -> str:
        """Get the hash of the environment matrix.

        Returns
        -------
        str
            The hash of the environment matrix.
        """
        dscpt_type = "se_a" if self.last_dim == 4 else "se_r"
        return get_hash(
            {
                "type": dscpt_type,
                "ntypes": self.descriptor.get_ntypes(),
                "rcut": round(self.descriptor.get_rcut(), 2),
                "rcut_smth": round(self.descriptor.rcut_smth, 2),
                "nsel": self.descriptor.get_nsel(),
                "sel": self.descriptor.get_sel(),
                "mixed_types": self.descriptor.mixed_types(),
            }
        )

However, it seems that the computed stats are not flushed to the file (even used self.root.flush() in DPH5Path), so an empty stats will be loaded and raise error.

pt/utils/env_mat_stat.py:213, in EnvMatStatSe.__call__(self)
    211 for type_i in range(self.descriptor.get_ntypes()):
    212     if self.last_dim == 4:
--> 213         davgunit = [[avgs[f"r_{type_i}"], 0, 0, 0]]
    214         dstdunit = [
    215             [
    216                 stds[f"r_{type_i}"],
   (...)
    220             ]
    221         ]
    222     elif self.last_dim == 1:

KeyError: 'r_0'

After computation, next training process will success in loading stats from hdf5 file.

DeePMD-kit Version

devel

Backend and its version

PyTorch v2.1.2

How did you download the software?

Built from source

Input Files, Running Commands, Error Log, etc.

cd examples/water/dpa2
dp --pt train input_torch_small.json

Steps to Reproduce

see above

Further Information, Files, and Links

No response

The text was updated successfully, but these errors were encountered:

Fix deepmodeling#4151. Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>

Fix #4151.  ## Summary by CodeRabbit - **New Features** - Enhanced path filtering logic to include a broader range of keys when generating subpaths. - **Bug Fixes** - Improved the accuracy of path results returned by the `glob` method.  Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>

iProzd added the bug label Sep 20, 2024

njzjz added the reproduced This bug has been reproduced by developers label Sep 20, 2024

njzjz added a commit to njzjz/deepmd-kit that referenced this issue Sep 20, 2024

fix: fix DPH5Path.glob for new keys

8dd3b4c

Fix deepmodeling#4151. Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>

njzjz mentioned this issue Sep 20, 2024

fix: fix DPH5Path.glob for new keys #4152

Merged

njzjz linked a pull request Sep 20, 2024 that will close this issue

fix: fix DPH5Path.glob for new keys #4152

Merged

njzjz closed this as completed Sep 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] EnvMatStat fails when two descriptors have the same hash #4151

[BUG] EnvMatStat fails when two descriptors have the same hash #4151

iProzd commented Sep 20, 2024

[BUG] EnvMatStat fails when two descriptors have the same hash #4151

[BUG] EnvMatStat fails when two descriptors have the same hash #4151

Comments

iProzd commented Sep 20, 2024

Bug summary

DeePMD-kit Version

Backend and its version

How did you download the software?

Input Files, Running Commands, Error Log, etc.

Steps to Reproduce

Further Information, Files, and Links