Skip to content

abel-blue/breastCancer-causal-Inference

Repository files navigation

Breast Cancer diagnosis with Causal Inference

Operating systems Forks Badge Pull Requests Badge Issues Badge GitHub contributors License Badge

causal-image

Table of Contents

  1. Introduction
  2. Project Structure
  3. Installation guide

Introduction

Causal inference is an important link between the practice of cancer epidemiology and effective cancer prevention.

The causal graph is a central object in the framework mentioned above, but it is often unknown, subject to personal knowledge and bias, or loosely connected to the available data. The main objective of the task is to highlight the importance of the matter in a concrete way. In this spirit, trainees are expected to attempt the following tasks:

  1. Perform a causal inference task using Pearl’s framework;
  2. Infer the causal graph from observational data and then validate the graph;
  3. Merge machine learning with causal inference;
  4. Use the resulting graph to predict the outcome of a disease;

The first is straightforward, the second and third are still open questions in the research community, hence may need a bit more research, innovation, and thinking outside the box from trainees.

Data Features:

Features in the data are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass.

Attribute Information:

  • ID number
  • Diagnosis (M = malignant, B = benign)

The remaining (3-32)

Ten real-valued features are computed for each cell nucleus:

  • radius (mean of distances from center to points on the perimeter)
  • texture (standard deviation of gray-scale values)
  • Perimeter
  • Area
  • smoothness (local variation in radius lengths)
  • compactness (perimeter^2 / area - 1.0)
  • concavity (severity of concave portions of the contour)
  • concave points (number of concave portions of the contour)
  • Symmetry
  • fractal dimension ("coastline approximation" - 1)

The mean, standard error and "worst" or largest (mean of the three largest values) of these features were computed for each image, resulting in 30 features. For instance, field 3 is Mean Radius, field 13 is Radius SE, field 23 is Worst Radius. All feature values are recorded with four significant digits.

Missing attribute values: none Class distribution: 357 benign (not cancer), 212 malignant (cancer)


Project Structure

  • images/ the folder where all snapshot for the project are stored.
  • logs/ the folder where script logs are stored.
  • data/ the folder where the dataset files are stored.
  • .github/: the folder where github actions and unit-tests are integrated.
  • cml.yaml: the file where the cml configuration is stored.
  • unit_test.yml: the file where the unit-tests are stored.
  • .vscode/: the folder where local path are stored.
  • notebooks/: a jupyter notebook for preprocessing the data.
  • data_exploration.ipynb: a jupyter notebook for data exploration.
  • ml_preprocess.ipynb: a jupyter notebook for preprocessing the data.
  • causal_inference.ipynb: a jupyter notebook for causal inference feature extraction.
  • ml_model.ipynb: a jupyter notebook for machine learning model training.
  • scripts/: folder where modules are stored.
  • causality.py: a module for causal inference.
  • data_manipulation.py: a module for data manipulation.
  • data_exploration.py: a module for data exploration.
  • data_preProcessing.py: a module for data preprocessing.
  • tests/: the folder containing unit tests for the scripts.
  • test.py: the file containing unit tests for the scripts.
  • requirements.txt: a text file listing the projet's dependancies.
  • .travis.yml: a configuration file Travis CI for unit test.
  • setup.py: a configuration file for installing the scripts as a package.
  • results.txt: a text file containing the results of the cml report.
  • train.py: a script for training the model.
  • README.md: Markdown text with a brief explanation of the project and the repository structure.

Installation guide

Conda Enviroment

conda create --name causality python==3.8
conda activate causality

Next

git clone https://github.com/Abel-Blue/breastCancer-causal-Inference.git
cd breastCancer-causal-Inference
sudo python3 setup.py install

License

MIT-License