Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update parameter sets #9

Merged
merged 16 commits into from
Jun 25, 2021
  •  
  •  
  •  
98 changes: 82 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,4 @@
# breast_cancer

[![Actions Status](https://github.com/pasmopy/breast_cancer/workflows/Tests/badge.svg)](https://github.com/pasmopy/breast_cancer/actions)
[![License](https://img.shields.io/badge/License-Apache%202.0-green.svg)](https://opensource.org/licenses/Apache-2.0)
# Breast cancer [![Actions Status](https://github.com/pasmopy/breast_cancer/workflows/Tests/badge.svg)](https://github.com/pasmopy/breast_cancer/actions)

Workflow for classifying breast cancer subtypes based on intracellular signaling dynamics.

Expand All @@ -10,18 +7,20 @@ Workflow for classifying breast cancer subtypes based on intracellular signaling
| Language | Dependent packages |
| ------------- | -------------------------------------------------------------- |
| Python >= 3.7 | See [`requirements.txt`](requirements.txt) |
| Julia >= 1.5 | [BioMASS.jl](https://github.com/himoto/BioMASS.jl) |
| Julia >= 1.5 | [BioMASS.jl](https://github.com/biomass-dev/BioMASS.jl) |
| R >= 4.0 | TCGAbiolinks, sva, biomaRt, edgeR, ComplexHeatmap, viridisLite |

## Table of contents

- [Transcriptomic data integration](#integration-of-tcga-and-ccle-data)
- [Integration of TCGA and CCLE data](#integration-of-tcga-and-ccle-data)

- [Model construction](#construction-of-a-comprehensive-model-of-the-ErbB-signaling-network)
- [Construction of a comprehensive model of the ErbB signaling network](#construction-of-a-comprehensive-model-of-the-ErbB-signaling-network)

- [Individualization of the mechanistic model](#individualization-of-the-mechanistic-model)

- [Subtype classification](#subtype-classification-based-on-the-ErbB-signaling-dynamics)
- [Subtype classification based on the ErbB signaling dynamics](#subtype-classification-based-on-the-ErbB-signaling-dynamics)

- [Investigation of patient-specific pathway activities](#investigation-of-patient-specific-pathway-activities)

## Integration of TCGA and CCLE data

Expand All @@ -39,9 +38,12 @@ Workflow for classifying breast cancer subtypes based on intracellular signaling
1. Use `pasmopy.Text2Model` to build a mechanistic model

```python
import os

from pasmopy import Text2Model

Text2Model("models/erbb_network.txt").convert()

Text2Model(os.path.join("models", "erbb_network.txt")).convert()
```

1. Rename `erbb_network/` to CCLE_name or TCGA_ID, e.g., `MCF7_BREAST` or `TCGA_3C_AALK_01A`
Expand All @@ -60,11 +62,12 @@ Workflow for classifying breast cancer subtypes based on intracellular signaling
from . import __path__
from .name2idx import C, V
from .set_model import initial_values, param_values


incorporating_gene_expression_levels = Individualization(
parameters=C.NAMES,
species=V.NAMES,
transcriptomic_data="transcriptomic_data/TPM_RLE_postComBat.csv",
transcriptomic_data=os.path.join("transcriptomic_data", "TPM_RLE_postComBat.csv"),
gene_expression={
"ErbB1": ["EGFR"],
"ErbB2": ["ERBB2"],
Expand Down Expand Up @@ -123,9 +126,12 @@ Workflow for classifying breast cancer subtypes based on intracellular signaling
1. Build a mechanistic model to identify model parameters

```python
import os

from pasmopy import Text2Model

Text2Model("models/erbb_network.txt", lang="julia").convert()

Text2Model(os.path.join("models", "erbb_network.txt"), lang="julia").convert()
```

1. Add time-series data to [`experimental_data.jl`](training/erbb_network_jl/experimental_data.jl)
Expand All @@ -144,21 +150,25 @@ Workflow for classifying breast cancer subtypes based on intracellular signaling

When finished, run:

```julia
```bash
$ julia
```

```julia
using BioMASS

julia> using BioMASS
julia> param2biomass("training")
param2biomass("training")
```

And you will get [`dat2npy/out/`](https://github.com/pasmopy/breast_cancer/tree/master/training/erbb_network_jl/dat2npy/out).
This is the optimized parameter sets that [`biomass`](https://github.com/okadalabipr/biomass) can recognize and read.
This is the estimated parameter sets that [`biomass`](https://github.com/biomass-dev/biomass) can recognize and read.
Copy `out/` to each patient-specific model folder via:

```python
import os
import shutil


breast_cancer_models = []
path_to_models = os.path.join("models", "breast")
for f in os.listdir(path_to_models):
Expand All @@ -169,7 +179,7 @@ Workflow for classifying breast cancer subtypes based on intracellular signaling
# Set optimized parameters
for model in breast_cancer_models:
shutil.copytree(
os.path.join("training", "dat2npy", "out"),
os.path.join("training", "erbb_network_jl", "dat2npy", "out"),
os.path.join(path_to_models, f"{model}", "out"),
)
```
Expand Down Expand Up @@ -224,6 +234,62 @@ Workflow for classifying breast cancer subtypes based on intracellular signaling
$ Rscript brca_heatmap.R 6 8,5
```

## Investigation of patient-specific pathway activities

### Sensitivity analysis

- Calculate sensitivity coefficients by varying the amount of each nonzero species

```python
from pasmopy import PatientModelAnalyses

import models.breast

with open ("selected_tnbc.txt" mode="r") as f:
TNBC_ID = f.read().splitlines()
analyses = PatientModelAnalyses(
models.breast.__package__,
TNBC_ID,
biomass_kws={
"metric": "maximum", "style": "heatmap", "options": {"excluded_initials": ["PIP2"]}
},
)
```

### Drug response data analysis

1. Calculation of the ErbB receptor expression ratio

```bash
$ cd drug_response
$ Rscript data/calc_erbb_ratio.R
```

1. Drug response analysis and visualization

```python
import os

import pandas as pd
from drug.database import CancerCellLineEncyclopedia


ccle = CancerCellLineEncyclopedia()

erbb_expression_ratio = pd.read_csv(
os.path.join("data", "ErbB_expression_ratio.csv"),
index_col=0
)
compounds = ["Erlotinib", "Lapatinib", "AZD6244", "PD-0325901"]
for compound in compounds:
ccle.save_all(erbb_expression_ratio, compound)
```

## Author

- Hiroaki Imoto
- Sawa Yamashiro

## License

[Apache License 2.0](LICENSE)
8 changes: 5 additions & 3 deletions drug_response/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,15 +25,17 @@
1. Drug response analysis and visualization

```python
import os
import pandas as pd
from drug.database import CancerCellLineEncyclopedia

ccle = CancerCellLineEncyclopedia()

erbb_expression_ratio = pd.read_csv("./data/ErbB_expression_ratio.csv", index_col=0)

erbb_expression_ratio = pd.read_csv(
os.path.join("data", "ErbB_expression_ratio.csv"),
index_col=0
)
compounds = ["Erlotinib", "Lapatinib", "AZD6244", "PD-0325901"]

for compound in compounds:
ccle.save_all(erbb_expression_ratio, compound)
```
67 changes: 44 additions & 23 deletions drug_response/drug/database.py
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,7 @@ def save_dose_response_curve(
erbb_expression_ratio: pd.DataFrame,
drug: str,
*,
config: Optional[dict] = None,
show_individual: bool = False,
error: str = "std",
save_format: str = "pdf",
Expand Down Expand Up @@ -162,14 +163,17 @@ def save_dose_response_curve(
[drug] * len(bottom30),
)

plt.figure(figsize=(7, 5))
plt.rcParams['font.size'] = 24
plt.rcParams['font.family'] = 'Arial'
plt.rcParams['axes.linewidth'] = 2.4
plt.rcParams['xtick.major.width'] = 2.4
plt.rcParams['ytick.major.width'] = 2.4
plt.rcParams['lines.linewidth'] = 3
plt.rcParams["lines.markersize"] = 1
if config is None:
config = {}
config.setdefault('figure.figsize', (7, 5))
config.setdefault('font.size', 24)
config.setdefault('font.family', 'Arial')
config.setdefault('axes.linewidth', 2.4)
config.setdefault('xtick.major.width', 2.4)
config.setdefault('ytick.major.width', 2.4)
config.setdefault('lines.linewidth', 3)
config.setdefault("lines.markersize", 1)
plt.rcParams.update(config)
plt.gca().spines["right"].set_visible(False)
plt.gca().spines["top"].set_visible(False)

Expand Down Expand Up @@ -279,6 +283,7 @@ def save_activity_area(
erbb_expression_ratio: pd.DataFrame,
drug: str,
*,
config: Optional[dict] = None,
save_format: str = "pdf",
) -> None:
self._check_args(drug, save_format)
Expand Down Expand Up @@ -315,15 +320,20 @@ def save_activity_area(
],
)

plt.figure(figsize=(3, 5))
plt.rcParams['font.family'] = 'Arial'
plt.rcParams['mathtext.it'] = 'Arial:italic'
plt.rcParams['font.size'] = 24
plt.rcParams['axes.linewidth'] = 2.4
plt.rcParams['xtick.major.width'] = 2.4
plt.rcParams['ytick.major.width'] = 2.4
plt.rcParams['lines.linewidth'] = 1.8
plt.rcParams['lines.markersize'] = 11
if config is None:
config = {}
config.setdefault('figure.figsize', (3, 5))
config.setdefault('font.family', 'Arial')
config.setdefault('mathtext.it', 'Arial:italic')
config.setdefault('font.size', 24)
config.setdefault('axes.linewidth', 2.4)
config.setdefault('xtick.major.width', 2.4)
config.setdefault('ytick.major.width', 2.4)
config.setdefault('lines.linewidth', 1.8)
config.setdefault('lines.markersize', 11)

plt.rcParams.update(config)

plt.gca().spines["right"].set_visible(False)
plt.gca().spines["top"].set_visible(False)

Expand All @@ -333,10 +343,9 @@ def save_activity_area(
data=(activity_area[0], activity_area[1]),
palette=sns.color_palette(['darkmagenta', 'goldenrod']),
)
#sns.stripplot(data=(activity_area[0], activity_area[1]),color=".26")
sns.swarmplot(data=(activity_area[0], activity_area[1]),color=".48")
#sns.boxenplot(data=(activity_area[0], activity_area[1]), palette=sns.color_palette(['#76106cff', '#f7a426ff']))
#sns.violinplot(data=(activity_area[0], activity_area[1]), palette=sns.color_palette(['mediumslateblue', 'tomato']))
sns.swarmplot(
data=(activity_area[0], activity_area[1]),color=".48"
)
plt.xticks([0, 1], ["EGFR\nhigh", "EGFR\nlow"])
plt.ylabel("Activity area", fontsize=28)
plt.suptitle(
Expand All @@ -360,7 +369,19 @@ def save_all(
erbb_expression_ratio: pd.DataFrame,
drug: str,
*,
config_dose_response_curve: Optional[dict] = None,
config_activity_area: Optional[dict] = None,
save_format: str = "pdf",
) -> None:
self.save_dose_response_curve(erbb_expression_ratio, drug, save_format=save_format)
self.save_activity_area(erbb_expression_ratio, drug, save_format=save_format)
self.save_dose_response_curve(
erbb_expression_ratio,
drug,
config=config_dose_response_curve,
save_format=save_format,
)
self.save_activity_area(
erbb_expression_ratio,
drug,
config=config_activity_area,
save_format=save_format,
)
63 changes: 1 addition & 62 deletions models/breast/TCGA_3C_AALK_01A/__init__.py
Original file line number Diff line number Diff line change
@@ -1,66 +1,5 @@
import os
from typing import List

from .fitness import objective
from .fitness import OptimizationProblem
from .name2idx import C, V
from .observable import ExperimentalData, NumericalSimulation, observables
from .reaction_network import ReactionNetwork
from .set_model import initial_values, param_values
from .set_search_param import SearchParam
from .viz import Visualization


def _check_duplicate(names: List[str], object: str) -> List[str]:
duplicate = [name for name in set(names) if names.count(name) > 1]
if not duplicate:
return names
else:
raise NameError(f"Duplicate {object}: {', '.join(duplicate)}")


class BioMassModel(object):
def __init__(self) -> None:
self.path = __path__[0]
self.parameters = _check_duplicate(C.NAMES, "parameters")
self.species = _check_duplicate(V.NAMES, "species")
self.obs = _check_duplicate(observables, "observables")
self.pval = param_values
self.ival = initial_values
self.obj_func = objective
self.sim = NumericalSimulation()
self.exp = ExperimentalData()
self.viz = Visualization()
self.sp = SearchParam()
self.rxn = ReactionNetwork()


def show_info() -> None:
model_name = os.path.basename(os.path.dirname(__file__))
print(
f"{model_name} information\n" + ("-" * len(model_name)) + "------------\n"
f"{len(BioMassModel().species):d} species\n"
f"{len(BioMassModel().parameters):d} parameters, "
f"of which {len(BioMassModel().sp.idx_params):d} to be estimated"
)


def create() -> BioMassModel:
model = BioMassModel()
if model.sim.normalization:
for obs_name in model.obs:
if (
isinstance(model.sim.normalization[obs_name]["timepoint"], int)
and not model.sim.t[0]
<= model.sim.normalization[obs_name]["timepoint"]
<= model.sim.t[-1]
):
raise ValueError("Normalization timepoint must lie within sim.t.")
if not model.sim.normalization[obs_name]["condition"]:
model.sim.normalization[obs_name]["condition"] = model.sim.conditions
else:
for c in model.sim.normalization[obs_name]["condition"]:
if c not in model.sim.conditions:
raise ValueError(
f"Normalization condition '{c}' is not defined in sim.conditions."
)
return model
Loading