Separate task preprocessing from simulation execution #399

jonrkarr · 2021-09-12T18:54:33Z

Notes on limitations

preprocess_sed_task should be re-run if any of these conditions are met
- model structure must be changed (e.g., additional species, reactions)
- simulation algorithm or algorithm parameters changed
- additional attributes (parameters, initial conditions) need to be changed -- its best to outline all attributes that might need to be changed upon the initial call of preprocess_sed_task
- additional variables need to be recorded -- its best to outline all variables that might need to be recorded upon the initial call of preprocess_sed_task
Because some simulator representations of models diverge from their associated model languages, some changes that can be applied to model specifications cannot easily be applied to in-memory simulation representations of model
- The SBML-fbc representation of FBA models diverges a little from how simulation tools represent models. In particular, SBML-fbc uses a small number of parameters to represent flux bounds. In contrast, simulation tools flatten this out to separate parameters for each upper and lower bound of each reaction. These low-dimensional parameters can be changed at the model specification (XML) level, but are difficult to change at the simulator level because simulators don't retain knowledge of these parameters. Due to this divergence, we support two different mechanisms for changing FBA models
  - exec_sed_task: supports model changes on the simulator representation of models. This should work well for Vivarium. Presently, this is limited to changing flux bounds.
  - Execution of SED-ML files and COMBINE archives: supports model changes on the XML representation of models. This supports the full set of possible changes: change attributes and add/remove/replace XML nodes
- The Smoldyn software also diverges from Smoldyn simulation configurations. For example, the Smoldyn software does not retain information about parameter values.
  - As a result, parameters can only be edited during task preprocessing when simulation configuration files are read
  - In contrast, molecule counts can be set repeated as part of task execution
Some simulation tools don't represent or provide ways to set initial levels
- BoolNet: appears to only hold initial levels for constant species
- GINsim: See BioLQM models should have an associated initial state GINsim/GINsim-python#19
For some simulation tools, repeated executions of exec_sed_task require re-parsing models
- BioNetGen: primarily a command-line tool implemented in Perl; py-perl5 could maybe be used to improve the connection to the Perl program; see Suggestion: use py-perl5 for more seamless bridge to BNGL2.pl RuleWorld/PyBioNetGen#22
- LibSBMLSim: doesn't expose a method for parsing models; see Separate parsing and simulation of models in Python API libsbmlsim/libsbmlsim#23
- XPP: only available as a binary executable
- pyNeuroML: actual simulator is implemented in Java below multiple layers of Python and Java packages. Model files are passed down through these layers. Could be improved with a Python-Java bridge, but would take some of work.

The text was updated successfully, but these errors were encountered:

jonrkarr · 2021-09-12T19:04:09Z

@eagmon, the progress on factoring out unnecessary computations for repeated execution is summarized above.

The preprocessed information is sufficient to change values of parameters and initial conditions. Presently, more substantial changes such as adding/removing/replacing species/reactions would require re-preprocessing models.

For SBML and CellML, this follows their SED-ML conventions of using XML XPaths to address model components. Once this refactoring is done, we can work on a second, simpler way of addressing model components by their SBML/CellML ids. At least to start, this would be restricted to changing values of parameters and initial conditions. Adding/removing/replacing components would only be supported at the XML level where there's already a convention for describing such changes.

eagmon · 2021-09-12T21:38:36Z

@jonrkarr -- Looks like good progress. I know from our work on biosimulators-tellurium that we used exec_sed_task and preprocess_sed_task methods -- are these same methods available for all simulators with ✅ ? I know biosimulators-cobrapy did not previously have those module attributes.

jonrkarr · 2021-09-12T22:03:47Z

Until recently, each simulator API had 1 method exec_sed_task. Each API now has two methods

exec_sed_task
preprocess_sed_task

preprocess_sed_task returns a data structure which essentially represents parsed models and a map between our standard representation of models and simulations (SED-ML/KiSAO) and each simulator's internal representation. This data structure is unique to each simulation tool.

exec_sed_task has an optional argument preprocessed_task for this preprocessed information. If the argument isn't provided, then exec_sed_task has to build this map. Providing this argument avoids any computation common to multiple repeated executions of a single model (typically with different parameters and/or initial conditions).

I've implemented and pushed half of the preprocess_sed_task methods. The others are still just skeletons. I'm hoping to finish that in the next few days.

For constraint-based simulations, there's opportunity to go further to hot start optimizations with some solvers such as CPLEX and Gurobi. This would require changes to the FBA packages, COBRApy and CBMpy.

jonrkarr · 2021-09-24T03:00:53Z

The updated Docker image is released. The entrypoint now opens an iPython shell to the Pipenv environment with all of the simulation tools.

docker pull ghcr.io/biosimulators/biosimulators:0.0.2
docker run -it --rm ghcr.io/biosimulators/biosimulators:0.0.2

The only two standardized tools that aren't included are

OpenCOR: Installation is complicated and requires Python 3.7. This group is working toward a more composable simulation library which the core simulation functionality separated from the GUI.
VCell: No Python API available. The developers are thinking about creating a Python API. They have an old API that could be a good starting point.

The updated simulation tools are deployed on the main RunBioSimulations simulation service. They will be updated soon on the low latency/low performance service.

More documentation (e.g., Jupyter notebook) is still coming.

jonrkarr self-assigned this Sep 18, 2021

jonrkarr closed this as completed Sep 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate task preprocessing from simulation execution #399

Separate task preprocessing from simulation execution #399

jonrkarr commented Sep 12, 2021 •

edited

Loading

jonrkarr commented Sep 12, 2021

eagmon commented Sep 12, 2021

jonrkarr commented Sep 12, 2021

jonrkarr commented Sep 24, 2021

Separate task preprocessing from simulation execution #399

Separate task preprocessing from simulation execution #399

Comments

jonrkarr commented Sep 12, 2021 • edited Loading

Notes on limitations

jonrkarr commented Sep 12, 2021

eagmon commented Sep 12, 2021

jonrkarr commented Sep 12, 2021

jonrkarr commented Sep 24, 2021

jonrkarr commented Sep 12, 2021 •

edited

Loading