diff --git a/README.md b/README.md index 8462ccaf2..a7aa26bac 100644 --- a/README.md +++ b/README.md @@ -25,7 +25,7 @@ [![conda install](https://img.shields.io/conda/dn/conda-forge/dpgen?label=conda%20install)](https://anaconda.org/conda-forge/dpgen) [![pip install](https://img.shields.io/pypi/dm/dpgen?label=pip%20install)](https://pypi.org/project/dpgen) -DP-GEN (Deep Generator) is a software written in Python, delicately designed to generate a deep learning based model of interatomic potential energy and force field. DP-GEN is depedent on DeepMD-kit (https://github.com/deepmodeling/deepmd-kit/blob/master/README.md). With highly scalable interface with common softwares for molecular simulation, DP-GEN is capable to automatically prepare scripts and maintain job queues on HPC machines (High Performance Cluster) and analyze results. +DP-GEN (Deep Generator) is a software written in Python, delicately designed to generate a deep learning based model of interatomic potential energy and force field. DP-GEN is dependent on [DeepMD-kit](https://github.com/deepmodeling/deepmd-kit/blob/master/README.md). With highly scalable interface with common softwares for molecular simulation, DP-GEN is capable to automatically prepare scripts and maintain job queues on HPC machines (High Performance Cluster) and analyze results. If you use this software in any publication, please cite: @@ -34,7 +34,7 @@ Yuzhi Zhang, Haidi Wang, Weijie Chen, Jinzhe Zeng, Linfeng Zhang, Han Wang, and ### Highlighted features + **Accurate and efficient**: DP-GEN is capable to sample more than tens of million structures and select only a few for first principles calculation. DP-GEN will finally obtain a uniformly accurate model. + **User-friendly and automatic**: Users may install and run DP-GEN easily. Once succusefully running, DP-GEN can dispatch and handle all jobs on HPCs, and thus there's no need for any personal effort. -+ **Highly scalable**: With modularized code structures, users and developers can easily extend DP-GEN for their most relevant needs. DP-GEN currently supports for HPC systems (Slurm, PBS, LSF and cloud machines ), Deep Potential interface with DeePMD-kit, MD interface with LAMMPS and *ab-initio* calculation interface with VASP, PWSCF,SIESTA and Gaussian. We're sincerely welcome and embraced to users' contributions, with more possibilities and cases to use DP-GEN. ++ **Highly scalable**: With modularized code structures, users and developers can easily extend DP-GEN for their most relevant needs. DP-GEN currently supports for HPC systems (Slurm, PBS, LSF and cloud machines ), Deep Potential interface with DeePMD-kit, MD interface with [LAMMPS](https://www.lammps.org/), [Gromacs](http://www.gromacs.org/) and *ab-initio* calculation interface with VASP, PWSCF, CP2K, SIESTA and Gaussian, Abacus, PWMAT, etc . We're sincerely welcome and embraced to users' contributions, with more possibilities and cases to use DP-GEN. ### Code structure and interface + dpgen: @@ -43,7 +43,8 @@ Yuzhi Zhang, Haidi Wang, Weijie Chen, Jinzhe Zeng, Linfeng Zhang, Han Wang, and * generator: source codes for main process of deep generator. * auto_test : source code for undertaking materials property analysis. - * remote : source code for automatically submiting scripts,maintaining job queues and collecting results. + * remote and dispatcher : source code for automatically submiting scripts,maintaining job queues and collecting results. + Notice this part hase been integrated into [dpdispatcher](https://github.com/deepmodeling/dpdispatcher) * database : source code for collecting data generated by DP-GEN and interface with database. + examples : providing example JSON files. @@ -63,6 +64,15 @@ Options for TASK: * `test`: Auto-test for Deep Potential. * `db`: Collecting data from DP-GEN. + +[Here](examples) are examples you can refer to. You should make sure that provide a correct [JSON](https://docs.python.org/3/library/json.html) file. +You can use following command to check your JSON file. +```python +import json +#Specify machine parameters in machine.json +json.load(open("machine.json")) +``` + ## Download and Install One can download the source code of dpgen by ```bash @@ -1322,7 +1332,9 @@ mem_limit | Interger | 16 | Maximal memory permitted to apply for the job. | # End of resources | command | String | "lmp_serial" | Executable path of software, such as `lmp_serial`, `lmp_mpi` and `vasp_gpu`, `vasp_std`, etc. | group_size | Integer | 5 | DP-GEN will put these jobs together in one submitting script. - +| user_forward_files | List of str | ["/path_to/vdw_kernel.bindat"] | These files will be uploaded in each calculation task. You should make sure provide the path exists. +| user_backward_files | List of str | ["HILLS"] | Besides DP-GEN's normal output, these files will be downloaded after each calculation. You should make sure these files can be generated. + ## Troubleshooting 1. The most common problem is whether two settings correspond with each other, including: - The order of elements in `type_map` and `mass_map` and **`fp_pp_files`**. diff --git a/doc/CONTRIBUTING.md b/doc/CONTRIBUTING.md new file mode 100644 index 000000000..31a8996a1 --- /dev/null +++ b/doc/CONTRIBUTING.md @@ -0,0 +1,10 @@ +# DP-GEN Contributing Guide +Welcome to [DP-GEN](https://github.com/deepmodeling/dpgen/tree/master/dpgen) ! + + +## How to contribute +DP-GEN adopts the same convention as other softwares in DeepModeling Community. +You can first refer to DeePMD-kit's +[Contributing guide](https://github.com/deepmodeling/deepmd-kit/edit/devel/CONTRIBUTING.md) +and [Developer guide](https://github.com/deepmodeling/deepmd-kit/edit/devel/doc/development/index.md). + diff --git a/dpgen/auto_test/common_equi.py b/dpgen/auto_test/common_equi.py index 9dcb83e03..103e16dcc 100644 --- a/dpgen/auto_test/common_equi.py +++ b/dpgen/auto_test/common_equi.py @@ -9,10 +9,9 @@ from dpgen.auto_test.calculator import make_calculator from dpgen.auto_test.mpdb import get_structure from dpgen.dispatcher.Dispatcher import make_dispatcher -from dpgen.remote.decide_machine import decide_fp_machine, decide_model_devi_machine from distutils.version import LooseVersion from dpgen.dispatcher.Dispatcher import make_submission - +from dpgen.remote.decide_machine import convert_mdata lammps_task_type = ['deepmd', 'meam', 'eam_fs', 'eam_alloy'] @@ -133,9 +132,9 @@ def run_equi(confs, inter_type = inter_param['type'] # vasp if inter_type == "vasp": - mdata = decide_fp_machine(mdata) + mdata = convert_mdata(mdata, ["fp"]) elif inter_type in lammps_task_type: - mdata = decide_model_devi_machine(mdata) + mdata = convert_mdata(mdata, ["model_devi"]) else: raise RuntimeError("unknown task %s, something wrong" % inter_type) diff --git a/dpgen/auto_test/common_prop.py b/dpgen/auto_test/common_prop.py index bbd7203e2..00f439d37 100644 --- a/dpgen/auto_test/common_prop.py +++ b/dpgen/auto_test/common_prop.py @@ -13,9 +13,8 @@ from dpgen.auto_test.Vacancy import Vacancy from dpgen.auto_test.calculator import make_calculator from dpgen.dispatcher.Dispatcher import make_dispatcher -from dpgen.remote.decide_machine import decide_fp_machine, decide_model_devi_machine from dpgen.dispatcher.Dispatcher import make_submission - +from dpgen.remote.decide_machine import convert_mdata lammps_task_type = ['deepmd', 'meam', 'eam_fs', 'eam_alloy'] @@ -150,9 +149,9 @@ def run_property(confs, inter_type = inter_param_prop['type'] # vasp if inter_type == "vasp": - mdata = decide_fp_machine(mdata) + mdata = convert_mdata(mdata, ["fp"]) elif inter_type in lammps_task_type: - mdata = decide_model_devi_machine(mdata) + mdata = convert_mdata(mdata, ["model_devi"]) else: raise RuntimeError("unknown task %s, something wrong" % inter_type) diff --git a/dpgen/auto_test/lib/util.py b/dpgen/auto_test/lib/util.py index 0a86287fd..32709da28 100644 --- a/dpgen/auto_test/lib/util.py +++ b/dpgen/auto_test/lib/util.py @@ -77,11 +77,11 @@ def get_machine_info(mdata,task_type): command = vasp_exec command = cmd_append_log(command, "log") elif task_type in lammps_task_type: - lmp_exec = mdata['lmp_command'] + model_devi_exec = mdata['model_devi_command'] group_size = mdata['model_devi_group_size'] resources = mdata['model_devi_resources'] machine=mdata['model_devi_machine'] - command = lmp_exec + " -i in.lammps" + command = model_devi_exec + " -i in.lammps" command = cmd_append_log(command, "model_devi.log") return machine, resources, command, group_size diff --git a/dpgen/data/gen.py b/dpgen/data/gen.py index 10d220d61..25c610c61 100644 --- a/dpgen/data/gen.py +++ b/dpgen/data/gen.py @@ -22,14 +22,16 @@ import dpgen.data.tools.sc as sc from distutils.version import LooseVersion from dpgen.generator.lib.vasp import incar_upper +from dpgen.generator.lib.utils import symlink_user_forward_files from pymatgen.core import Structure from pymatgen.io.vasp import Incar -from dpgen.remote.decide_machine import decide_fp_machine +from dpgen.remote.decide_machine import convert_mdata from dpgen import ROOT_PATH from dpgen.dispatcher.Dispatcher import Dispatcher, make_dispatcher, make_submission + def create_path (path,back=False) : if path[-1] != "/": path += '/' @@ -311,12 +313,7 @@ def make_vasp_relax (jdata, mdata) : os.remove(os.path.join(work_dir, 'POTCAR')) shutil.copy2( jdata['relax_incar'], os.path.join(work_dir, 'INCAR')) - is_cvasp = False - if 'cvasp' in mdata['fp_resources'].keys(): - is_cvasp = mdata['fp_resources']['cvasp'] - if is_cvasp: - cvasp_file=os.path.join(ROOT_PATH,'generator/lib/cvasp.py') - shutil.copyfile(cvasp_file, os.path.join(work_dir, 'cvasp.py')) + out_potcar = os.path.join(work_dir, 'POTCAR') with open(out_potcar, 'w') as outfile: for fname in potcars: @@ -338,8 +335,17 @@ def make_vasp_relax (jdata, mdata) : os.symlink(ln_src, 'POTCAR') except FileExistsError: pass + is_cvasp = False + if 'cvasp' in mdata['fp_resources'].keys(): + is_cvasp = mdata['fp_resources']['cvasp'] + if is_cvasp: + cvasp_file = os.path.join(ROOT_PATH, 'generator/lib/cvasp.py') + shutil.copyfile(cvasp_file, 'cvasp.py') os.chdir(work_dir) os.chdir(cwd) + symlink_user_forward_files(mdata=mdata, task_type="fp", + work_path=os.path.join(os.path.basename(out_dir),global_dirname_02), + task_format= {"fp" : "sys-*"}) def make_scale(jdata): out_dir = jdata['out_dir'] @@ -373,6 +379,7 @@ def make_scale(jdata): os.chdir(scale_path) poscar_scale(pos_src, 'POSCAR', jj) os.chdir(cwd) + def pert_scaled(jdata) : out_dir = jdata['out_dir'] @@ -425,7 +432,7 @@ def pert_scaled(jdata) : shutil.copy2(pos_in, pos_out) os.chdir(cwd) -def make_vasp_md(jdata) : +def make_vasp_md(jdata, mdata) : out_dir = jdata['out_dir'] potcars = jdata['potcars'] scale = jdata['scale'] @@ -451,7 +458,9 @@ def make_vasp_md(jdata) : with open(fname) as infile: outfile.write(infile.read()) os.chdir(path_md) - os.chdir(cwd) + os.chdir(cwd) + + for ii in sys_ps : for jj in scale : @@ -478,8 +487,20 @@ def make_vasp_md(jdata) : os.symlink(os.path.relpath(file_potcar), 'POTCAR') except FileExistsError: pass + + is_cvasp = False + if 'cvasp' in mdata['fp_resources'].keys(): + is_cvasp = mdata['fp_resources']['cvasp'] + if is_cvasp: + cvasp_file = os.path.join(ROOT_PATH, 'generator/lib/cvasp.py') + shutil.copyfile(cvasp_file, 'cvasp.py') - os.chdir(cwd) + os.chdir(cwd) + + symlink_user_forward_files(mdata=mdata, task_type="fp", + work_path=os.path.join(os.path.basename(out_dir),global_dirname_04), + task_format= {"fp" :"sys-*/scale*/00*"}) + def coll_vasp_md(jdata) : out_dir = jdata['out_dir'] @@ -565,11 +586,14 @@ def run_vasp_relax(jdata, mdata): work_dir = os.path.join(jdata['out_dir'], global_dirname_02) forward_files = ["POSCAR", "INCAR", "POTCAR"] + user_forward_files = mdata.get("fp" + "_user_forward_files", []) + forward_files += [os.path.basename(file) for file in user_forward_files] backward_files = ["OUTCAR","CONTCAR"] + backward_files += mdata.get("fp" + "_user_backward_files", []) forward_common_files = [] if 'cvasp' in mdata['fp_resources']: if mdata['fp_resources']['cvasp']: - forward_common_files=['cvasp.py'] + forward_files +=['cvasp.py'] relax_tasks = glob.glob(os.path.join(work_dir, "sys-*")) relax_tasks.sort() #dlog.info("work_dir",work_dir) @@ -624,11 +648,14 @@ def run_vasp_md(jdata, mdata): md_nstep = jdata['md_nstep'] forward_files = ["POSCAR", "INCAR", "POTCAR"] + user_forward_files = mdata.get("fp" + "_user_forward_files", []) + forward_files += [os.path.basename(file) for file in user_forward_files] backward_files = ["OUTCAR"] + backward_files += mdata.get("fp" + "_user_backward_files", []) forward_common_files = [] if 'cvasp' in mdata['fp_resources']: if mdata['fp_resources']['cvasp']: - forward_common_files=['cvasp.py'] + forward_files +=['cvasp.py'] path_md = work_dir path_md = os.path.abspath(path_md) @@ -694,7 +721,7 @@ def gen_init_bulk(args) : if args.MACHINE is not None: # Selecting a proper machine - mdata = decide_fp_machine(mdata) + mdata = convert_mdata(mdata, ["fp"]) #disp = make_dispatcher(mdata["fp_machine"]) # Decide work path @@ -757,9 +784,12 @@ def gen_init_bulk(args) : pert_scaled(jdata) elif stage == 3 : dlog.info("Current stage is 3, run a short md") - make_vasp_md(jdata) if args.MACHINE is not None: + make_vasp_md(jdata, mdata) run_vasp_md(jdata, mdata) + else: + make_vasp_md(jdata, {"fp_resources":{}}) + elif stage == 4 : dlog.info("Current stage is 4, collect data") coll_vasp_md(jdata) diff --git a/dpgen/data/surf.py b/dpgen/data/surf.py index 322d26ad4..13420e118 100644 --- a/dpgen/data/surf.py +++ b/dpgen/data/surf.py @@ -11,7 +11,7 @@ import dpgen.data.tools.bcc as bcc from dpgen import dlog from dpgen import ROOT_PATH -from dpgen.remote.decide_machine import decide_fp_machine +from dpgen.remote.decide_machine import convert_mdata from dpgen.dispatcher.Dispatcher import Dispatcher, make_dispatcher #-----PMG--------- from pymatgen.io.vasp import Poscar @@ -596,7 +596,7 @@ def gen_init_surf(args): if args.MACHINE is not None: # Decide a proper machine - mdata = decide_fp_machine(mdata) + mdata = convert_mdata(mdata, ["fp"]) # disp = make_dispatcher(mdata["fp_machine"]) #stage = args.STAGE diff --git a/dpgen/generator/ch4/machine.json b/dpgen/generator/ch4/machine.json index bff646bcd..653f613d6 100644 --- a/dpgen/generator/ch4/machine.json +++ b/dpgen/generator/ch4/machine.json @@ -21,7 +21,7 @@ "_comment": "that's all" }, - "lmp_command": "/sharedext4/softwares/lammps/bin/lmp_serial", + "model_devi_command": "/sharedext4/softwares/lammps/bin/lmp_serial", "model_devi_group_size": 1, "_comment": "model_devi on localhost", "model_devi_machine": { diff --git a/dpgen/generator/lib/utils.py b/dpgen/generator/lib/utils.py index af7a71bf6..772d379ce 100644 --- a/dpgen/generator/lib/utils.py +++ b/dpgen/generator/lib/utils.py @@ -1,6 +1,7 @@ #!/usr/bin/env python3 import os, re, shutil, logging +import glob iter_format = "%06d" task_format = "%02d" @@ -60,4 +61,37 @@ def log_task (message) : def record_iter (record, ii, jj) : with open (record, "a") as frec : - frec.write ("%d %d\n" % (ii, jj)) + frec.write ("%d %d\n" % (ii, jj)) + +def symlink_user_forward_files(mdata, task_type, work_path, task_format = None): + ''' + Symlink user-defined forward_common_files + Current path should be work_path, such as 00.train + + Parameters + --------- + mdata : dict + machine parameters + task_type: str + task_type, such as "train" + work_path : str + work_path, such as "iter.000001/00.train" + Returns + ------- + None + ''' + user_forward_files = mdata.get(task_type + "_" + "user_forward_files", []) + #Angus: In the future, we may unify the task format. + if task_format is None: + task_format = {"train" : "0*", "model_devi" : "task.*", "fp": "task.*"} + #"init_relax" : "sys-*", "init_md" : "sys-*/scale*/00*" + for file in user_forward_files: + assert os.path.isfile(file) ,\ + "user_forward_file %s of %s stage doesn't exist. " % (file, task_type) + tasks = glob.glob(os.path.join(work_path, task_format[task_type])) + for task in tasks: + if os.path.isfile(os.path.join(task, os.path.basename(file))): + os.remove(os.path.join(task, os.path.basename(file))) + os.symlink(file, os.path.join(task, os.path.basename(file))) + return + \ No newline at end of file diff --git a/dpgen/generator/run.py b/dpgen/generator/run.py index ceaaed9b2..c40e31356 100644 --- a/dpgen/generator/run.py +++ b/dpgen/generator/run.py @@ -38,6 +38,7 @@ from dpgen.generator.lib.utils import log_iter from dpgen.generator.lib.utils import record_iter from dpgen.generator.lib.utils import log_task +from dpgen.generator.lib.utils import symlink_user_forward_files from dpgen.generator.lib.lammps import make_lammps_input from dpgen.generator.lib.vasp import write_incar_dict from dpgen.generator.lib.vasp import make_vasp_incar_user_dict @@ -53,11 +54,7 @@ from dpgen.generator.lib.gaussian import make_gaussian_input, take_cluster from dpgen.generator.lib.cp2k import make_cp2k_input, make_cp2k_input_from_external, make_cp2k_xyz from dpgen.generator.lib.ele_temp import NBandsEsti -from dpgen.remote.RemoteJob import SSHSession, JobStatus, SlurmJob, PBSJob, LSFJob, CloudMachineJob, awsMachineJob -from dpgen.remote.group_jobs import ucloud_submit_jobs, aws_submit_jobs -from dpgen.remote.group_jobs import group_slurm_jobs -from dpgen.remote.group_jobs import group_local_jobs -from dpgen.remote.decide_machine import decide_train_machine, decide_fp_machine, decide_model_devi_machine +from dpgen.remote.decide_machine import convert_mdata from dpgen.dispatcher.Dispatcher import Dispatcher, _split_tasks, make_dispatcher, make_submission from dpgen.util import sepline from dpgen import ROOT_PATH @@ -345,7 +342,7 @@ def make_train (iter_index, else: raise RuntimeError('invalid setting for use_ele_temp ' + str(use_ele_temp)) else: - raise RuntimeError("DP-GEN currently only supports for DeePMD-kit 1.x version!" ) + raise RuntimeError("DP-GEN currently only supports for DeePMD-kit 1.x or 2.x version!" ) # set training reuse model if training_reuse_iter is not None and iter_index >= training_reuse_iter: if LooseVersion('1') <= LooseVersion(mdata["deepmd_version"]) < LooseVersion('2'): @@ -384,7 +381,7 @@ def make_train (iter_index, jinput['model']['fitting_net']['seed'] = random.randrange(sys.maxsize) % (2**32) jinput['training']['seed'] = random.randrange(sys.maxsize) % (2**32) else: - raise RuntimeError("DP-GEN currently only supports for DeePMD-kit 1.x version!" ) + raise RuntimeError("DP-GEN currently only supports for DeePMD-kit 1.x or 2.x version!" ) # set model activation function if model_devi_activation_func is not None: if LooseVersion(mdata["deepmd_version"]) < LooseVersion('1'): @@ -422,6 +419,9 @@ def make_train (iter_index, for ii in range(len(iter0_models)): old_model_files = glob.glob(os.path.join(iter0_models[ii], 'model.ckpt*')) _link_old_models(work_path, old_model_files, ii) + # Copy user defined forward files + symlink_user_forward_files(mdata=mdata, task_type="train", work_path=work_path) + def _link_old_models(work_path, old_model_files, ii): @@ -502,7 +502,7 @@ def run_train (iter_index, command = '%s freeze' % train_command commands.append(command) else: - raise RuntimeError("DP-GEN currently only supports for DeePMD-kit 1.x version!" ) + raise RuntimeError("DP-GEN currently only supports for DeePMD-kit 1.x or 2.x version!" ) #_tasks = [os.path.basename(ii) for ii in all_task] # run_tasks = [] @@ -559,8 +559,10 @@ def run_train (iter_index, train_group_size = 1 api_version = mdata.get('api_version', '0.9') - # print('debug:commands', commands) - + + user_forward_files = mdata.get("train" + "_user_forward_files", []) + forward_files += [os.path.basename(file) for file in user_forward_files] + backward_files += mdata.get("train" + "_user_backward_files", []) if LooseVersion(api_version) < LooseVersion('1.0'): warnings.warn(f"the dpdispatcher will be updated to new version." f"And the interface may be changed. Please check the documents for more details") @@ -836,7 +838,8 @@ def make_model_devi (iter_index, _make_model_devi_revmat(iter_index, jdata, mdata, conf_systems) else: raise RuntimeError('unknown model_devi input mode', input_mode) - + #Copy user defined forward_files + symlink_user_forward_files(mdata=mdata, task_type="model_devi", work_path=work_path) return True @@ -1159,10 +1162,7 @@ def run_model_devi (iter_index, jdata, mdata) : #rmdlog.info("This module has been run !") - lmp_exec = mdata['lmp_command'] - # Angus: lmp_exec name should be changed to model_devi_exec. - # We should also change make_dispatcher - # For now, I will use this name for gromacs command + model_devi_exec = mdata['model_devi_command'] model_devi_group_size = mdata['model_devi_group_size'] model_devi_resources = mdata['model_devi_resources'] @@ -1196,7 +1196,7 @@ def run_model_devi (iter_index, model_devi_engine = jdata.get("model_devi_engine", "lammps") if model_devi_engine == "lammps": - command = "{ if [ ! -f dpgen.restart.10000 ]; then %s -i input.lammps -v restart 0; else %s -i input.lammps -v restart 1; fi }" % (lmp_exec, lmp_exec) + command = "{ if [ ! -f dpgen.restart.10000 ]; then %s -i input.lammps -v restart 0; else %s -i input.lammps -v restart 1; fi }" % (model_devi_exec, model_devi_exec) command = "/bin/sh -c '%s'" % command commands = [command] forward_files = ['conf.lmp', 'input.lammps', 'traj'] @@ -1217,8 +1217,8 @@ def run_model_devi (iter_index, maxwarn = gromacs_settings.get("maxwarn", 1) nsteps = cur_job["nsteps"] - command = "%s grompp -f %s -p %s -c %s -o %s -maxwarn %d" % (lmp_exec, mdp_filename, topol_filename, conf_filename, deffnm, maxwarn) - command += "&& %s mdrun -deffnm %s -nsteps %d" %(lmp_exec, deffnm, nsteps) + command = "%s grompp -f %s -p %s -c %s -o %s -maxwarn %d" % (model_devi_exec, mdp_filename, topol_filename, conf_filename, deffnm, maxwarn) + command += "&& %s mdrun -deffnm %s -nsteps %d" %(model_devi_exec, deffnm, nsteps) commands = [command] forward_files = [mdp_filename, topol_filename, conf_filename, index_filename, "input.json" ] @@ -1227,6 +1227,9 @@ def run_model_devi (iter_index, cwd = os.getcwd() + user_forward_files = mdata.get("model_devi" + "_user_forward_files", []) + forward_files += [os.path.basename(file) for file in user_forward_files] + backward_files += mdata.get("model_devi" + "_user_backward_files", []) api_version = mdata.get('api_version', '0.9') if LooseVersion(api_version) < LooseVersion('1.0'): warnings.warn(f"the dpdispatcher will be updated to new version." @@ -2015,6 +2018,10 @@ def make_fp (iter_index, make_fp_pwmat(iter_index, jdata) else : raise RuntimeError ("unsupported fp style") + # Copy user defined forward_files + iter_name = make_iter_name(iter_index) + work_path = os.path.join(iter_name, fp_name) + symlink_user_forward_files(mdata=mdata, task_type="fp", work_path=work_path) def _vasp_check_fin (ii) : if os.path.isfile(os.path.join(ii, 'OUTCAR')) : @@ -2120,6 +2127,10 @@ def run_fp_inner (iter_index, # fp_run_tasks.append(ii) run_tasks = [os.path.basename(ii) for ii in fp_run_tasks] + user_forward_files = mdata.get("fp" + "_user_forward_files", []) + forward_files += [os.path.basename(file) for file in user_forward_files] + backward_files += mdata.get("fp" + "_user_backward_files", []) + api_version = mdata.get('api_version', '0.9') if LooseVersion(api_version) < LooseVersion('1.0'): warnings.warn(f"the dpdispatcher will be updated to new version." @@ -2158,10 +2169,9 @@ def run_fp (iter_index, mdata) : fp_style = jdata['fp_style'] fp_pp_files = jdata['fp_pp_files'] - if fp_style == "vasp" : forward_files = ['POSCAR', 'INCAR', 'POTCAR','KPOINTS'] - backward_files = ['OUTCAR','vasprun.xml'] + backward_files = ['fp.log','OUTCAR','vasprun.xml'] # Move cvasp interface to jdata if ('cvasp' in jdata) and (jdata['cvasp'] == True): mdata['fp_resources']['cvasp'] = True @@ -2646,7 +2656,8 @@ def run_iter (param_file, machine_file) : listener = logging.handlers.QueueListener(que, smtp_handler) dlog.addHandler(queue_handler) listener.start() - + # Convert mdata + mdata = convert_mdata(mdata) max_tasks = 10000 numb_task = 9 record = "record.dpgen" @@ -2673,7 +2684,6 @@ def run_iter (param_file, machine_file) : make_train (ii, jdata, mdata) elif jj == 1 : log_iter ("run_train", ii, jj) - mdata = decide_train_machine(mdata) run_train (ii, jdata, mdata) elif jj == 2 : log_iter ("post_train", ii, jj) @@ -2685,7 +2695,6 @@ def run_iter (param_file, machine_file) : break elif jj == 4 : log_iter ("run_model_devi", ii, jj) - mdata = decide_model_devi_machine(mdata) run_model_devi (ii, jdata, mdata) elif jj == 5 : @@ -2696,7 +2705,6 @@ def run_iter (param_file, machine_file) : make_fp (ii, jdata, mdata) elif jj == 7 : log_iter ("run_fp", ii, jj) - mdata = decide_fp_machine(mdata) run_fp (ii, jdata, mdata) elif jj == 8 : log_iter ("post_fp", ii, jj) diff --git a/dpgen/remote/decide_machine.py b/dpgen/remote/decide_machine.py index cda17853e..5996b45b2 100644 --- a/dpgen/remote/decide_machine.py +++ b/dpgen/remote/decide_machine.py @@ -11,278 +11,312 @@ import numpy as np from distutils.version import LooseVersion -def decide_train_machine(mdata): - if LooseVersion(mdata.get('api_version', '0.9')) >= LooseVersion('1.0'): - mdata['train_group_size'] = mdata['train'][0]['resources']['group_size'] - if 'train' in mdata: - continue_flag = False - if 'record.machine' in os.listdir(): - try: - with open('record.machine', 'r') as _infile: - profile = json.load(_infile) - if profile['purpose'] == 'train': - mdata['train_machine'] = profile['machine'] - mdata['train_resources'] = profile['resources'] - - if 'python_path' in profile: - mdata['python_path'] = profile['python_path'] - if "group_size" in profile: - mdata["train_group_size"] = profile["group_size"] - if 'deepmd_version' in profile: - mdata["deepmd_version"] = profile['deepmd_version'] - if 'command' in profile: - mdata['train_command'] = profile["command"] - continue_flag = True - except: - pass - if ("hostname" not in mdata["train"][0]["machine"]) or (len(mdata["train"]) == 1): - mdata["train_machine"] = mdata["train"][0]["machine"] - mdata["train_resources"] = mdata["train"][0]["resources"] - - if 'python_path' in mdata["train"][0]: - mdata["python_path"] = mdata["train"][0]["python_path"] - if "group_size" in mdata["train"][0]: - mdata["train_group_size"] = mdata["train"][0]["group_size"] - if 'deepmd_version' in mdata["train"][0]: - mdata["deepmd_version"] = mdata["train"][0]["deepmd_version"] - if 'command' in mdata["train"][0]: - mdata["train_command"] = mdata["train"][0]["command"] - continue_flag = True - - pd_flag = False - pd_count_list =[] - # pd for pending job in slurm - # if we need to launch new machine_idxines - if not continue_flag: - #assert isinstance(mdata['train']['machine'], list) - #assert isinstance(mdata['train']['resources'], list) - #assert len(mdata['train']['machine']) == len(mdata['train']['resources']) - # mdata['train'] is a list - for machine_idx in range(len(mdata['train'])): - temp_machine = mdata['train'][machine_idx]['machine'] - temp_resources = mdata['train'][machine_idx]['resources'] - temp_ssh_sess = SSHSession(temp_machine) - cwd = os.getcwd() - temp_context = SSHContext(cwd, temp_ssh_sess) - if temp_machine['machine_type'] == 'lsf': - temp_batch = LSF(temp_context) - else: - temp_batch = Slurm(temp_context) - # For other type of machines, please add them using 'elif'. - # Here slurm is selected as the final choice in convinience. - command = temp_batch._make_squeue(temp_machine, temp_resources) - ret, stdin, stdout, stderr = temp_batch.context.block_call(command) - pd_response = stdout.read().decode('utf-8').split("\n") - pd_count = len(pd_response) - temp_context.clean() - ## If there is no need to waiting for allocation - if pd_count ==1: - mdata['train_machine'] = temp_machine - mdata['train_resources'] = temp_resources +def convert_mdata(mdata, task_types=["train", "model_devi", "fp"]): + ''' + Convert mdata for DP-GEN main process. + New convension is like mdata["fp"]["machine"], + DP-GEN needs mdata["fp_machine"] - if 'python_path' in mdata['train'][machine_idx]: - mdata['python_path'] = mdata['train'][machine_idx]['python_path'] - if 'group_size' in mdata['train'][machine_idx]: - mdata['train_group_size'] = mdata['train'][machine_idx]['group_size'] - if 'deepmd_version' in mdata['train'][machine_idx]: - mdata['deepmd_version'] = mdata['train'][machine_idx]['deepmd_version'] - if 'command' in mdata['train'][machine_idx]: - mdata['train_command'] = mdata['train'][machine_idx]['command'] + Notice that we deprecate the function which can automatically select one most avalaible machine, + since this function was only used by Angus, and only supports for Slurm. + In the future this can be implemented. - ## No need to wait - pd_flag = True - break - else: - pd_count_list.append(pd_count) - if not pd_flag: - ## All machines need waiting, then compare waiting jobs - ## Select a machine which has fewest waiting jobs - min_machine_idx = np.argsort(pd_count_list)[0] - mdata['train_machine'] = mdata['train'][min_machine_idx]['machine'] - mdata['train_resources'] = mdata['train'][min_machine_idx]['resources'] - - if 'python_path' in mdata['train'][min_machine_idx]: - mdata['python_path'] = mdata['train'][min_machine_idx]['python_path'] - if "group_size" in mdata['train'][min_machine_idx]: - mdata["train_group_size"] = mdata['train'][min_machine_idx]["group_size"] - if 'deepmd_version' in mdata['train'][min_machine_idx]: - mdata['deepmd_version'] = mdata['train'][min_machine_idx]["deepmd_version"] - if 'command' in mdata['train'][min_machine_idx]: - mdata['train_command'] = mdata['train'][min_machine_idx]['command'] + Parameters + ---------- + mdata : dict + Machine parameters to be converted. + task_types : list of string + Type of tasks, default is ["train", "model_devi", "fp"] - ## Record which machine is selected - with open("record.machine","w") as _outfile: - profile = {} - profile['purpose'] = 'train' - profile['machine'] = mdata['train_machine'] - profile['resources'] = mdata['train_resources'] - - if 'python_path' in mdata: - profile['python_path'] = mdata['python_path'] - if "train_group_size" in mdata: - profile["group_size"] = mdata["train_group_size"] - if 'deepmd_version' in mdata: - profile['deepmd_version'] = mdata['deepmd_version'] - if 'train_command' in mdata: - profile['command'] = mdata['train_command'] + Returns + ------- + dict + mdata converted + ''' + for task_type in task_types: + if task_type in mdata: + for key, item in mdata[task_type][0].items(): + if "comments" not in key: + mdata[task_type + "_" + key] = item + group_size = mdata[task_type][0]["resources"].get("group_size", 1) + mdata[task_type + "_" + "group_size"] = group_size + return mdata - json.dump(profile, _outfile, indent = 4) - return mdata -def decide_model_devi_machine(mdata): - if LooseVersion(mdata.get('api_version', '0.9')) >= LooseVersion('1.0'): - mdata['model_devi_group_size'] = mdata['model_devi'][0]['resources']['group_size'] - if 'model_devi' in mdata: - continue_flag = False - if 'record.machine' in os.listdir(): - try: - with open('record.machine', 'r') as _infile: - profile = json.load(_infile) - if profile['purpose'] == 'model_devi': - mdata['model_devi_machine'] = profile['machine'] - mdata['model_devi_resources'] = profile['resources'] - mdata['lmp_command'] = profile['command'] - mdata['model_devi_group_size'] = profile['group_size'] - continue_flag = True - except: - pass - if ("hostname" not in mdata["model_devi"][0]["machine"]) or (len(mdata["model_devi"]) == 1): - mdata["model_devi_machine"] = mdata["model_devi"][0]["machine"] - mdata["model_devi_resources"] = mdata["model_devi"][0]["resources"] - mdata["lmp_command"] = mdata["model_devi"][0]["command"] - #if "group_size" in mdata["train"][0]: - mdata["model_devi_group_size"] = mdata["model_devi"][0].get("group_size", 1) - continue_flag = True - pd_count_list =[] - pd_flag = False - if not continue_flag: - - #assert isinstance(mdata['model_devi']['machine'], list) - #ssert isinstance(mdata['model_devi']['resources'], list) - #assert len(mdata['model_devi']['machine']) == len(mdata['model_devi']['resources']) - - for machine_idx in range(len(mdata['model_devi'])): - temp_machine = mdata['model_devi'][machine_idx]['machine'] - temp_resources = mdata['model_devi'][machine_idx]['resources'] - #assert isinstance(temp_machine, dict), "unsupported type of model_devi machine [%d]!" %machine_idx - #assert isinstance(temp_resources, dict), "unsupported type of model_devi resources [%d]!"%machine_idx - #assert temp_machine['machine_type'] == 'slurm', "Currently only support for Slurm!" - temp_ssh_sess = SSHSession(temp_machine) - cwd = os.getcwd() - temp_context = SSHContext(cwd, temp_ssh_sess) - if temp_machine['machine_type'] == 'lsf': - temp_batch = LSF(temp_context) - else: - temp_batch = Slurm(temp_context) - # For other type of machines, please add them using 'elif'. - # Here slurm is selected as the final choice in convinience. - command = temp_batch._make_squeue(temp_machine, temp_resources) - ret, stdin, stdout, stderr = temp_batch.context.block_call(command) - pd_response = stdout.read().decode('utf-8').split("\n") - pd_count = len(pd_response) - temp_context.clean() - if pd_count ==0: - mdata['model_devi_machine'] = temp_machine - mdata['model_devi_resources'] = temp_resources - mdata['lmp_command'] = mdata['model_devi'][machine_idx]['command'] - mdata['model_devi_group_size'] = mdata['model_devi'][machine_idx].get('group_size', 1) - pd_flag = True - break - else: - pd_count_list.append(pd_count) - if not pd_flag: - min_machine_idx = np.argsort(pd_count_list)[0] - mdata['model_devi_machine'] = mdata['model_devi'][min_machine_idx]['machine'] - mdata['model_devi_resources'] = mdata['model_devi'][min_machine_idx]['resources'] - mdata['lmp_command'] = mdata['model_devi'][min_machine_idx]['command'] - mdata['model_devi_group_size'] = mdata['model_devi'][min_machine_idx].get('group_size', 1) - with open("record.machine","w") as _outfile: - profile = {} - profile['purpose'] = 'model_devi' - profile['machine'] = mdata['model_devi_machine'] - profile['resources'] = mdata['model_devi_resources'] - profile['group_size'] = mdata['model_devi_group_size'] - profile['command'] = mdata['lmp_command'] - - json.dump(profile, _outfile, indent = 4) - return mdata -def decide_fp_machine(mdata): - if LooseVersion(mdata.get('api_version', '0.9')) >= LooseVersion('1.0'): - mdata['fp_group_size'] = mdata['fp'][0]['resources']['group_size'] - if 'fp' in mdata: - #ssert isinstance(mdata['fp']['machine'], list) - #assert isinstance(mdata['fp']['resources'], list) - #assert len(mdata['fp']['machine']) == len(mdata['fp']['resources']) - continue_flag = False - ## decide whether to use an existing machine - if 'record.machine' in os.listdir(): - try: - with open('record.machine', 'r') as _infile: - profile = json.load(_infile) - if profile['purpose'] == 'fp': - mdata['fp_machine'] = profile['machine'] - mdata['fp_resources'] = profile['resources'] - mdata['fp_command'] = profile['command'] - mdata['fp_group_size'] = profile['group_size'] - - continue_flag = True - except: - pass - if ("hostname" not in mdata["fp"][0]["machine"]) or (len(mdata["fp"]) == 1): - mdata["fp_machine"] = mdata["fp"][0]["machine"] - mdata["fp_resources"] = mdata["fp"][0]["resources"] - mdata["fp_command"] = mdata["fp"][0]["command"] - #if "group_size" in mdata["train"][0]: - mdata["fp_group_size"] = mdata["fp"][0].get("group_size", 1) - continue_flag = True - - - pd_count_list =[] - pd_flag = False - if not continue_flag: - for machine_idx in range(len(mdata['fp'])): - temp_machine = mdata['fp'][machine_idx]['machine'] - temp_resources = mdata['fp'][machine_idx]['resources'] - temp_ssh_sess = SSHSession(temp_machine) - cwd = os.getcwd() - temp_context = SSHContext(cwd, temp_ssh_sess) - if temp_machine['machine_type'] == 'lsf': - temp_batch = LSF(temp_context) - else: - temp_batch = Slurm(temp_context) - # For other type of machines, please add them using 'elif'. - # Here slurm is selected as the final choice in convinience. - command = temp_batch._make_squeue(temp_machine, temp_resources) - ret, stdin, stdout, stderr = temp_batch.context.block_call(command) - pd_response = stdout.read().decode('utf-8').split("\n") - pd_count = len(pd_response) - temp_context.clean() - #dlog.info(temp_machine["username"] + " " + temp_machine["hostname"] + " " + str(pd_count)) - if pd_count ==0: - mdata['fp_machine'] = temp_machine - mdata['fp_resources'] = temp_resources - mdata['fp_command'] = mdata['fp'][machine_idx]['command'] - mdata['fp_group_size'] = mdata['fp'][machine_idx].get('group_size', 1) - pd_flag = True - break - else: - pd_count_list.append(pd_count) - if not pd_flag: - min_machine_idx = np.argsort(pd_count_list)[0] - mdata['fp_machine'] = mdata['fp'][min_machine_idx]['machine'] - mdata['fp_resources'] = mdata['fp'][min_machine_idx]['resources'] - mdata['fp_command'] = mdata['fp'][min_machine_idx]['command'] - mdata['fp_group_size'] = mdata['fp'][min_machine_idx].get('group_size',1) - - with open("record.machine","w") as _outfile: - profile = {} - profile['purpose'] = 'fp' - profile['machine'] = mdata['fp_machine'] - profile['resources'] = mdata['fp_resources'] - profile['group_size'] = mdata['fp_group_size'] - profile['command'] = mdata['fp_command'] - json.dump(profile, _outfile, indent = 4) - return mdata +# def decide_train_machine(mdata): +# if LooseVersion(mdata.get('api_version', '0.9')) >= LooseVersion('1.0'): +# mdata['train_group_size'] = mdata['train'][0]['resources']['group_size'] +# if 'train' in mdata: +# continue_flag = False +# if 'record.machine' in os.listdir(): +# try: +# with open('record.machine', 'r') as _infile: +# profile = json.load(_infile) +# if profile['purpose'] == 'train': +# mdata['train_machine'] = profile['machine'] +# mdata['train_resources'] = profile['resources'] +# +# if 'python_path' in profile: +# mdata['python_path'] = profile['python_path'] +# if "group_size" in profile: +# mdata["train_group_size"] = profile["group_size"] +# if 'deepmd_version' in profile: +# mdata["deepmd_version"] = profile['deepmd_version'] +# if 'command' in profile: +# mdata['train_command'] = profile["command"] +# continue_flag = True +# except: +# pass +# if ("hostname" not in mdata["train"][0]["machine"]) or (len(mdata["train"]) == 1): +# mdata["train_machine"] = mdata["train"][0]["machine"] +# mdata["train_resources"] = mdata["train"][0]["resources"] +# +# if 'python_path' in mdata["train"][0]: +# mdata["python_path"] = mdata["train"][0]["python_path"] +# if "group_size" in mdata["train"][0]: +# mdata["train_group_size"] = mdata["train"][0]["group_size"] +# if 'deepmd_version' in mdata["train"][0]: +# mdata["deepmd_version"] = mdata["train"][0]["deepmd_version"] +# if 'command' in mdata["train"][0]: +# mdata["train_command"] = mdata["train"][0]["command"] +# continue_flag = True +# +# pd_flag = False +# pd_count_list =[] +# # pd for pending job in slurm +# # if we need to launch new machine_idxines +# if not continue_flag: +# +# #assert isinstance(mdata['train']['machine'], list) +# #assert isinstance(mdata['train']['resources'], list) +# #assert len(mdata['train']['machine']) == len(mdata['train']['resources']) +# # mdata['train'] is a list +# for machine_idx in range(len(mdata['train'])): +# temp_machine = mdata['train'][machine_idx]['machine'] +# temp_resources = mdata['train'][machine_idx]['resources'] +# temp_ssh_sess = SSHSession(temp_machine) +# cwd = os.getcwd() +# temp_context = SSHContext(cwd, temp_ssh_sess) +# if temp_machine['machine_type'] == 'lsf': +# temp_batch = LSF(temp_context) +# else: +# temp_batch = Slurm(temp_context) +# # For other type of machines, please add them using 'elif'. +# # Here slurm is selected as the final choice in convinience. +# command = temp_batch._make_squeue(temp_machine, temp_resources) +# ret, stdin, stdout, stderr = temp_batch.context.block_call(command) +# pd_response = stdout.read().decode('utf-8').split("\n") +# pd_count = len(pd_response) +# temp_context.clean() +# ## If there is no need to waiting for allocation +# if pd_count ==1: +# mdata['train_machine'] = temp_machine +# mdata['train_resources'] = temp_resources +# +# if 'python_path' in mdata['train'][machine_idx]: +# mdata['python_path'] = mdata['train'][machine_idx]['python_path'] +# if 'group_size' in mdata['train'][machine_idx]: +# mdata['train_group_size'] = mdata['train'][machine_idx]['group_size'] +# if 'deepmd_version' in mdata['train'][machine_idx]: +# mdata['deepmd_version'] = mdata['train'][machine_idx]['deepmd_version'] +# if 'command' in mdata['train'][machine_idx]: +# mdata['train_command'] = mdata['train'][machine_idx]['command'] +# +# ## No need to wait +# pd_flag = True +# break +# else: +# pd_count_list.append(pd_count) +# if not pd_flag: +# ## All machines need waiting, then compare waiting jobs +# ## Select a machine which has fewest waiting jobs +# min_machine_idx = np.argsort(pd_count_list)[0] +# mdata['train_machine'] = mdata['train'][min_machine_idx]['machine'] +# mdata['train_resources'] = mdata['train'][min_machine_idx]['resources'] +# +# if 'python_path' in mdata['train'][min_machine_idx]: +# mdata['python_path'] = mdata['train'][min_machine_idx]['python_path'] +# if "group_size" in mdata['train'][min_machine_idx]: +# mdata["train_group_size"] = mdata['train'][min_machine_idx]["group_size"] +# if 'deepmd_version' in mdata['train'][min_machine_idx]: +# mdata['deepmd_version'] = mdata['train'][min_machine_idx]["deepmd_version"] +# if 'command' in mdata['train'][min_machine_idx]: +# mdata['train_command'] = mdata['train'][min_machine_idx]['command'] +# +# ## Record which machine is selected +# with open("record.machine","w") as _outfile: +# profile = {} +# profile['purpose'] = 'train' +# profile['machine'] = mdata['train_machine'] +# profile['resources'] = mdata['train_resources'] +# +# if 'python_path' in mdata: +# profile['python_path'] = mdata['python_path'] +# if "train_group_size" in mdata: +# profile["group_size"] = mdata["train_group_size"] +# if 'deepmd_version' in mdata: +# profile['deepmd_version'] = mdata['deepmd_version'] +# if 'train_command' in mdata: +# profile['command'] = mdata['train_command'] +# +# json.dump(profile, _outfile, indent = 4) +# return mdata +# +# def decide_model_devi_machine(mdata): +# if LooseVersion(mdata.get('api_version', '0.9')) >= LooseVersion('1.0'): +# mdata['model_devi_group_size'] = mdata['model_devi'][0]['resources']['group_size'] +# if 'model_devi' in mdata: +# continue_flag = False +# if 'record.machine' in os.listdir(): +# try: +# with open('record.machine', 'r') as _infile: +# profile = json.load(_infile) +# if profile['purpose'] == 'model_devi': +# mdata['model_devi_machine'] = profile['machine'] +# mdata['model_devi_resources'] = profile['resources'] +# mdata['model_devi_command'] = profile['command'] +# mdata['model_devi_group_size'] = profile['group_size'] +# continue_flag = True +# except: +# pass +# if ("hostname" not in mdata["model_devi"][0]["machine"]) or (len(mdata["model_devi"]) == 1): +# mdata["model_devi_machine"] = mdata["model_devi"][0]["machine"] +# mdata["model_devi_resources"] = mdata["model_devi"][0]["resources"] +# mdata["model_devi_command"] = mdata["model_devi"][0]["command"] +# #if "group_size" in mdata["train"][0]: +# mdata["model_devi_group_size"] = mdata["model_devi"][0].get("group_size", 1) +# continue_flag = True +# +# pd_count_list =[] +# pd_flag = False +# if not continue_flag: +# +# #assert isinstance(mdata['model_devi']['machine'], list) +# #ssert isinstance(mdata['model_devi']['resources'], list) +# #assert len(mdata['model_devi']['machine']) == len(mdata['model_devi']['resources']) +# +# for machine_idx in range(len(mdata['model_devi'])): +# temp_machine = mdata['model_devi'][machine_idx]['machine'] +# temp_resources = mdata['model_devi'][machine_idx]['resources'] +# #assert isinstance(temp_machine, dict), "unsupported type of model_devi machine [%d]!" %machine_idx +# #assert isinstance(temp_resources, dict), "unsupported type of model_devi resources [%d]!"%machine_idx +# #assert temp_machine['machine_type'] == 'slurm', "Currently only support for Slurm!" +# temp_ssh_sess = SSHSession(temp_machine) +# cwd = os.getcwd() +# temp_context = SSHContext(cwd, temp_ssh_sess) +# if temp_machine['machine_type'] == 'lsf': +# temp_batch = LSF(temp_context) +# else: +# temp_batch = Slurm(temp_context) +# # For other type of machines, please add them using 'elif'. +# # Here slurm is selected as the final choice in convinience. +# command = temp_batch._make_squeue(temp_machine, temp_resources) +# ret, stdin, stdout, stderr = temp_batch.context.block_call(command) +# pd_response = stdout.read().decode('utf-8').split("\n") +# pd_count = len(pd_response) +# temp_context.clean() +# if pd_count ==0: +# mdata['model_devi_machine'] = temp_machine +# mdata['model_devi_resources'] = temp_resources +# mdata['model_devi_command'] = mdata['model_devi'][machine_idx]['command'] +# mdata['model_devi_group_size'] = mdata['model_devi'][machine_idx].get('group_size', 1) +# pd_flag = True +# break +# else: +# pd_count_list.append(pd_count) +# if not pd_flag: +# min_machine_idx = np.argsort(pd_count_list)[0] +# mdata['model_devi_machine'] = mdata['model_devi'][min_machine_idx]['machine'] +# mdata['model_devi_resources'] = mdata['model_devi'][min_machine_idx]['resources'] +# mdata['model_devi_command'] = mdata['model_devi'][min_machine_idx]['command'] +# mdata['model_devi_group_size'] = mdata['model_devi'][min_machine_idx].get('group_size', 1) +# with open("record.machine","w") as _outfile: +# profile = {} +# profile['purpose'] = 'model_devi' +# profile['machine'] = mdata['model_devi_machine'] +# profile['resources'] = mdata['model_devi_resources'] +# profile['group_size'] = mdata['model_devi_group_size'] +# profile['command'] = mdata['model_devi_command'] +# +# json.dump(profile, _outfile, indent = 4) +# return mdata +# def decide_fp_machine(mdata): +# if LooseVersion(mdata.get('api_version', '0.9')) >= LooseVersion('1.0'): +# mdata['fp_group_size'] = mdata['fp'][0]['resources']['group_size'] +# if 'fp' in mdata: +# #ssert isinstance(mdata['fp']['machine'], list) +# #assert isinstance(mdata['fp']['resources'], list) +# #assert len(mdata['fp']['machine']) == len(mdata['fp']['resources']) +# continue_flag = False +# ## decide whether to use an existing machine +# if 'record.machine' in os.listdir(): +# try: +# with open('record.machine', 'r') as _infile: +# profile = json.load(_infile) +# if profile['purpose'] == 'fp': +# mdata['fp_machine'] = profile['machine'] +# mdata['fp_resources'] = profile['resources'] +# mdata['fp_command'] = profile['command'] +# mdata['fp_group_size'] = profile['group_size'] +# +# continue_flag = True +# except: +# pass +# if ("hostname" not in mdata["fp"][0]["machine"]) or (len(mdata["fp"]) == 1): +# mdata["fp_machine"] = mdata["fp"][0]["machine"] +# mdata["fp_resources"] = mdata["fp"][0]["resources"] +# mdata["fp_command"] = mdata["fp"][0]["command"] +# #if "group_size" in mdata["train"][0]: +# mdata["fp_group_size"] = mdata["fp"][0].get("group_size", 1) +# continue_flag = True +# +# +# pd_count_list =[] +# pd_flag = False +# if not continue_flag: +# for machine_idx in range(len(mdata['fp'])): +# temp_machine = mdata['fp'][machine_idx]['machine'] +# temp_resources = mdata['fp'][machine_idx]['resources'] +# temp_ssh_sess = SSHSession(temp_machine) +# cwd = os.getcwd() +# temp_context = SSHContext(cwd, temp_ssh_sess) +# if temp_machine['machine_type'] == 'lsf': +# temp_batch = LSF(temp_context) +# else: +# temp_batch = Slurm(temp_context) +# # For other type of machines, please add them using 'elif'. +# # Here slurm is selected as the final choice in convinience. +# command = temp_batch._make_squeue(temp_machine, temp_resources) +# ret, stdin, stdout, stderr = temp_batch.context.block_call(command) +# pd_response = stdout.read().decode('utf-8').split("\n") +# pd_count = len(pd_response) +# temp_context.clean() +# #dlog.info(temp_machine["username"] + " " + temp_machine["hostname"] + " " + str(pd_count)) +# if pd_count ==0: +# mdata['fp_machine'] = temp_machine +# mdata['fp_resources'] = temp_resources +# mdata['fp_command'] = mdata['fp'][machine_idx]['command'] +# mdata['fp_group_size'] = mdata['fp'][machine_idx].get('group_size', 1) +# pd_flag = True +# break +# else: +# pd_count_list.append(pd_count) +# if not pd_flag: +# min_machine_idx = np.argsort(pd_count_list)[0] +# mdata['fp_machine'] = mdata['fp'][min_machine_idx]['machine'] +# mdata['fp_resources'] = mdata['fp'][min_machine_idx]['resources'] +# mdata['fp_command'] = mdata['fp'][min_machine_idx]['command'] +# mdata['fp_group_size'] = mdata['fp'][min_machine_idx].get('group_size',1) +# +# with open("record.machine","w") as _outfile: +# profile = {} +# profile['purpose'] = 'fp' +# profile['machine'] = mdata['fp_machine'] +# profile['resources'] = mdata['fp_resources'] +# profile['group_size'] = mdata['fp_group_size'] +# profile['command'] = mdata['fp_command'] +# json.dump(profile, _outfile, indent = 4) +# return mdata diff --git a/dpgen/simplify/simplify.py b/dpgen/simplify/simplify.py index 9856dc58a..768d64835 100644 --- a/dpgen/simplify/simplify.py +++ b/dpgen/simplify/simplify.py @@ -22,12 +22,10 @@ from dpgen import dlog from dpgen import SHORT_CMD from dpgen.util import sepline -from dpgen.remote.decide_machine import decide_train_machine from dpgen.dispatcher.Dispatcher import Dispatcher, make_dispatcher from dpgen.generator.run import make_train, run_train, post_train, run_fp, post_fp, fp_name, model_devi_name, train_name, train_task_fmt, sys_link_fp_vasp_pp, make_fp_vasp_incar, make_fp_vasp_kp, make_fp_vasp_cp_cvasp, data_system_fmt, model_devi_task_fmt, fp_task_fmt # TODO: maybe the following functions can be moved to dpgen.util from dpgen.generator.lib.utils import log_iter, make_iter_name, create_path, record_iter -from dpgen.remote.decide_machine import decide_train_machine, decide_fp_machine, decide_model_devi_machine from dpgen.generator.lib.gaussian import make_gaussian_input @@ -603,7 +601,8 @@ def run_iter(param_file, machine_file): listener = logging.handlers.QueueListener(que, smtp_handler) dlog.addHandler(queue_handler) listener.start() - + + mdata = convert_mdata(mdata) max_tasks = 10000 numb_task = 9 record = "record.dpgen" @@ -638,7 +637,6 @@ def run_iter(param_file, machine_file): make_train(ii, jdata, mdata) elif jj == 1: log_iter("run_train", ii, jj) - mdata = decide_train_machine(mdata) #disp = make_dispatcher(mdata['train_machine']) run_train(ii, jdata, mdata) elif jj == 2: @@ -651,7 +649,6 @@ def run_iter(param_file, machine_file): break elif jj == 4: log_iter("run_model_devi", ii, jj) - mdata = decide_model_devi_machine(mdata) #disp = make_dispatcher(mdata['model_devi_machine']) run_model_devi(ii, jdata, mdata) elif jj == 5: @@ -665,7 +662,6 @@ def run_iter(param_file, machine_file): if jdata.get("labeled", False): dlog.info("already have labeled data, skip run_fp") else: - mdata = decide_fp_machine(mdata) #disp = make_dispatcher(mdata['fp_machine']) run_fp(ii, jdata, mdata) elif jj == 8: diff --git a/examples/CH4-refact-dpdispatcher/machine-ali-ehpc.json b/examples/CH4-refact-dpdispatcher/machine-ali-ehpc.json index 442ddb201..a90b04f35 100644 --- a/examples/CH4-refact-dpdispatcher/machine-ali-ehpc.json +++ b/examples/CH4-refact-dpdispatcher/machine-ali-ehpc.json @@ -46,7 +46,11 @@ "queue_name": "T4_4_15", "group_size": 5, "source_list": ["/home/fengbo/deepmd.1.2.4.env"] - } + }, + "_comments" : "In user_forward_files, define input files to be uploaded.", + "user_forward_files" : [], + "_comments" : "In user_backward_files, define output files to be collected.", + "user_backward_files" : ["HILLS"] } ], "fp":[ @@ -69,7 +73,11 @@ "queue_name": "G_32_128", "group_size": 1, "source_list": ["~/vasp.env"] - } + }, + "_comments" : "In user_forward_files, define input files to be uploaded.", + "user_forward_files" : ["vdw_kernel.bindat"], + "_comments" : "In user_backward_files, define output files to be collected.", + "user_backward_files" : [] } ] } diff --git a/examples/init/INCAR_methane.md b/examples/init/INCAR_methane.md index a0e3ca29b..9831387aa 100644 --- a/examples/init/INCAR_methane.md +++ b/examples/init/INCAR_methane.md @@ -1,21 +1,33 @@ PREC=A -ENCUT=400 +ENCUT=400.000000 ISYM=0 -ALGO=Fast -EDIFF=1.000000e-06 -LREAL=False +ALGO=fast +EDIFF=1E-6 +LREAL=F NPAR=4 KPAR=1 -NELM=120 -NELMIN=4 + +NELM=200 +ISTART=0 +ICHARG=2 ISIF=2 ISMEAR=0 -SIGMA=0.20000 +SIGMA=0.200000 IBRION=0 -POTIM=0.5 +MAXMIX=50 +NBLOCK=1 +KBLOCK=100 + +SMASS=0 +POTIM=2g +TEBEG=50 +TEEND=50 + NSW=10 + LWAVE=F LCHARG=F PSTRESS=0 + KSPACING=0.500000 -KGAMMA=.FALSE. +KGAMMA=F diff --git a/examples/machine/DeePMD-kit-1.x/machine-local.json b/examples/machine/DeePMD-kit-1.x/machine-local.json index 5c356baef..a266f712b 100644 --- a/examples/machine/DeePMD-kit-1.x/machine-local.json +++ b/examples/machine/DeePMD-kit-1.x/machine-local.json @@ -13,7 +13,7 @@ "_comment": "model_devi on localhost ", - "lmp_command": "/home/wanghan/local/bin/lmp_mpi_010", + "model_devi_command": "/home/wanghan/local/bin/lmp_mpi_010", "model_devi_group_size": 5, "model_devi_machine": { "batch": "shell", diff --git a/examples/machine/DeePMD-kit-1.x/machine-pbs-gaussian.json b/examples/machine/DeePMD-kit-1.x/machine-pbs-gaussian.json index 25cb48349..6893471c5 100644 --- a/examples/machine/DeePMD-kit-1.x/machine-pbs-gaussian.json +++ b/examples/machine/DeePMD-kit-1.x/machine-pbs-gaussian.json @@ -27,7 +27,7 @@ "_comment": "model_devi on localhost ", - "lmp_command": "/gpfs/home/tzhu/lammps-stable_5Jun2019/src/lmp_intel_cpu_intelmpi -pk intel 0 omp 2", + "model_devi_command": "/gpfs/home/tzhu/lammps-stable_5Jun2019/src/lmp_intel_cpu_intelmpi -pk intel 0 omp 2", "model_devi_group_size": 1, "model_devi_machine": { "machine_type": "lsf", diff --git a/examples/machine/deprecated/DeePMD-kit-0.12/machine-aws.json b/examples/machine/deprecated/DeePMD-kit-0.12/machine-aws.json index f4015b612..7d050b548 100644 --- a/examples/machine/deprecated/DeePMD-kit-0.12/machine-aws.json +++ b/examples/machine/deprecated/DeePMD-kit-0.12/machine-aws.json @@ -96,7 +96,7 @@ "with_mpi":true }, "deepmd_path": "/deepmd_root/", - "lmp_command":"/usr/bin/lmp_mpi", + "model_devi_command":"/usr/bin/lmp_mpi", "fp_command":"/usr/bin/vasp_std", "train_resources": {}, diff --git a/examples/machine/deprecated/DeePMD-kit-0.12/machine-local.json b/examples/machine/deprecated/DeePMD-kit-0.12/machine-local.json index 057db2722..b8e15a625 100644 --- a/examples/machine/deprecated/DeePMD-kit-0.12/machine-local.json +++ b/examples/machine/deprecated/DeePMD-kit-0.12/machine-local.json @@ -14,7 +14,7 @@ "_comment": "model_devi on localhost ", - "lmp_command": "/home/wanghan/local/bin/lmp_mpi_010", + "model_devi_command": "/home/wanghan/local/bin/lmp_mpi_010", "model_devi_group_size": 5, "model_devi_machine": { "batch": "shell", diff --git a/examples/machine/deprecated/machine-hnu.json b/examples/machine/deprecated/machine-hnu.json index 8b9ee8003..eb9cb91f2 100644 --- a/examples/machine/deprecated/machine-hnu.json +++ b/examples/machine/deprecated/machine-hnu.json @@ -21,7 +21,7 @@ "_comment": "that's all" }, - "lmp_command": "/home/llang/dp_v2/local/bin/lmp_mpi_0_12_0", + "model_devi_command": "/home/llang/dp_v2/local/bin/lmp_mpi_0_12_0", "model_devi_group_size": 10, "_comment": "model_devi on localhost ", "model_devi_machine": { diff --git a/examples/machine/deprecated/machine-tiger-pwscf-della.json b/examples/machine/deprecated/machine-tiger-pwscf-della.json index 7201947b1..44911f487 100644 --- a/examples/machine/deprecated/machine-tiger-pwscf-della.json +++ b/examples/machine/deprecated/machine-tiger-pwscf-della.json @@ -19,7 +19,7 @@ "_comment": "that's all" }, - "lmp_command": "/home/linfengz/SCR/wanghan/local/bin/lmp_serial_0110_gpu", + "model_devi_command": "/home/linfengz/SCR/wanghan/local/bin/lmp_serial_0110_gpu", "model_devi_group_size": 20, "_comment": "model_devi on localhost ", "model_devi_machine": { diff --git a/examples/machine/deprecated/machine-tiger-vasp-della.json b/examples/machine/deprecated/machine-tiger-vasp-della.json index 822788b8f..fa1fdf6e9 100644 --- a/examples/machine/deprecated/machine-tiger-vasp-della.json +++ b/examples/machine/deprecated/machine-tiger-vasp-della.json @@ -19,7 +19,7 @@ "_comment": "that's all" }, - "lmp_command": "/home/linfengz/SCR/wanghan/local/bin/lmp_serial_0110_gpu", + "model_devi_command": "/home/linfengz/SCR/wanghan/local/bin/lmp_serial_0110_gpu", "model_devi_group_size": 10, "_comment": "model_devi on localhost ", "model_devi_machine": { diff --git a/examples/machine/deprecated/machine-tiger.json b/examples/machine/deprecated/machine-tiger.json index b1400d76f..ccc1b573f 100644 --- a/examples/machine/deprecated/machine-tiger.json +++ b/examples/machine/deprecated/machine-tiger.json @@ -19,7 +19,7 @@ "_comment": "that's all" }, - "lmp_command": "/home/linfengz/SCR/wanghan/local/bin/lmp_serial_0110_gpu", + "model_devi_command": "/home/linfengz/SCR/wanghan/local/bin/lmp_serial_0110_gpu", "model_devi_group_size": 20, "_comment": "model_devi on localhost ", "model_devi_machine": { diff --git a/examples/machine/deprecated/machine-ucloud.json b/examples/machine/deprecated/machine-ucloud.json index 963c250e9..52e9040c1 100644 --- a/examples/machine/deprecated/machine-ucloud.json +++ b/examples/machine/deprecated/machine-ucloud.json @@ -30,7 +30,7 @@ }, - "lmp_command": "/usr/bin/lmp_mpi", + "model_devi_command": "/usr/bin/lmp_mpi", "model_devi_group_size": 20, "model_devi_machine": { "machine_type": "ucloud", diff --git a/tests/generator/machine-local-v1.json b/tests/generator/machine-local-v1.json index 7079678e8..2218884f2 100644 --- a/tests/generator/machine-local-v1.json +++ b/tests/generator/machine-local-v1.json @@ -28,7 +28,7 @@ "source_list": [], "_comment": "that's All" }, - "lmp_command": "/home/wanghan/local/bin/lmp_mpi_1_1_0", + "model_devi_command": "/home/wanghan/local/bin/lmp_mpi_1_1_0", "model_devi_group_size": 10, "fp_machine": { diff --git a/tests/generator/machine-local.json b/tests/generator/machine-local.json index 05a0f2811..a4743c964 100644 --- a/tests/generator/machine-local.json +++ b/tests/generator/machine-local.json @@ -18,7 +18,7 @@ "_comment": "model_devi on localhost ", - "lmp_command": "/home/wanghan/local/bin/lmp_mpi_010", + "model_devi_command": "/home/wanghan/local/bin/lmp_mpi_010", "model_devi_group_size": 5, "model_devi_machine": { "machine_type": "local", @@ -49,6 +49,6 @@ "with_mpi": true, "_comment": "that's all" }, - + "fp_user_forward_files" : ["vdw_kernel.bindat"], "_comment": " that's all " } diff --git a/tests/generator/test_make_fp.py b/tests/generator/test_make_fp.py index 09ac5aede..914c9b149 100644 --- a/tests/generator/test_make_fp.py +++ b/tests/generator/test_make_fp.py @@ -481,6 +481,15 @@ def _check_pwmat_input(testCase, idx): testCase.assertEqual(lines.strip(), pwmat_input_ref.strip()) os.chdir(cwd) +def _check_symlink_user_forward_files(testCase, idx, file): + fp_path = os.path.join('iter.%06d' % idx, '02.fp') + tasks = glob.glob(os.path.join(fp_path, 'task.*')) + cwd = os.getcwd() + for ii in tasks: + os.chdir(ii) + testCase.assertEqual(os.path.isfile("vdw_kernel.bindat"), True) + os.chdir(cwd) + class TestMakeFPPwscf(unittest.TestCase): def test_make_fp_pwscf(self): setUpModule() @@ -614,7 +623,7 @@ def test_make_fp_vasp(self): atom_types = [0, 1, 0, 1] type_map = jdata['type_map'] _make_fake_md(0, md_descript, atom_types, type_map) - make_fp(0, jdata, {}) + make_fp(0, jdata, {"fp_user_forward_files" : ["vdw_kernel.bindat"] }) _check_sel(self, 0, jdata['fp_task_max'], jdata['model_devi_f_trust_lo'], jdata['model_devi_f_trust_hi']) _check_poscars(self, 0, jdata['fp_task_max'], jdata['type_map']) # _check_incar_exists(self, 0) @@ -755,7 +764,7 @@ def test_make_fp_vasp_ele_temp(self): # checked elsewhere # _check_potcar(self, 0, jdata['fp_pp_path'], jdata['fp_pp_files']) shutil.rmtree('iter.000000') - + class TestMakeFPGaussian(unittest.TestCase): def make_fp_gaussian(self, multiplicity="auto"): diff --git a/tests/generator/vdw_kernel.bindat b/tests/generator/vdw_kernel.bindat new file mode 100644 index 000000000..e69de29bb diff --git a/tests/tools/context.py b/tests/tools/context.py index d4e70a8c5..1d3510786 100644 --- a/tests/tools/context.py +++ b/tests/tools/context.py @@ -8,3 +8,5 @@ def my_file_cmp(test, f0, f1): with open(f1) as fp1: test.assertTrue(fp0.read() == fp1.read()) +def setUpModule(): + os.chdir(os.path.abspath(os.path.dirname(__file__))) diff --git a/tests/tools/machine_fp_single.json b/tests/tools/machine_fp_single.json new file mode 100644 index 000000000..f998388eb --- /dev/null +++ b/tests/tools/machine_fp_single.json @@ -0,0 +1,15 @@ +{ + "fp":[ + { + "command": "vasp_std", + "machine":{ + "batch_type": "PBS" + }, + "resources": { + "group_size" : 8 + }, + "_comments" : "In user_forward_files, define input files to be uploaded.", + "user_forward_files" : ["vdw_kernel.bindat"] + } + ] +} \ No newline at end of file diff --git a/tests/tools/test_convert_mdata.py b/tests/tools/test_convert_mdata.py new file mode 100644 index 000000000..5458b0faa --- /dev/null +++ b/tests/tools/test_convert_mdata.py @@ -0,0 +1,17 @@ +import os,sys,json +import unittest + +test_dir = os.path.abspath(os.path.join(os.path.dirname(__file__))) +sys.path.insert(0, os.path.join(test_dir, '..')) +__package__ = 'tools' +from dpgen.remote.decide_machine import convert_mdata +from .context import setUpModule +machine_file = 'machine_fp_single.json' +class TestConvertMdata(unittest.TestCase): + def test_convert_mdata (self): + mdata = json.load(open(machine_file)) + mdata = convert_mdata(mdata, ["fp"]) + self.assertEqual(mdata["fp_command"], "vasp_std") + self.assertEqual(mdata["fp_group_size"], 8) + self.assertEqual(mdata["fp_machine"]["batch_type"], "PBS") + self.assertEqual(mdata["fp_user_forward_files"], ["vdw_kernel.bindat"])