Skip to content

Latest commit

 

History

History
149 lines (121 loc) · 9.64 KB

README.md

File metadata and controls

149 lines (121 loc) · 9.64 KB

sepsis-prediction

Sepsis Prediction using Clinical Data (PhysioNet Computing in Cardiology Challenge 2019)

This project implements an LSTM-based sepsis prediction model using various clinical data sources. Specifically, the model takes 10 hours of input data and predicts the probability of sepsis within the next hour. On the test set, the model has an AUC of 0.76.

The data used for this project is from the 2019 PhysioNet Computing in Cardiology Challenge. The following link provides more information about the data and a link to download: https://physionet.org/content/challenge-2019/1.0.0/

The dataset is a series of PSV files, where each row represents a single hour of data.

To run the code in this project, run the following notebooks:

  1. psv_to_df.ipynb: This notebook loads the PhysioNet data PSV files and saves them into a Pandas DataFrame for ease of downstream analysis
  2. feature_engineering.ipynb: This notebook generates 10 hour-windowed features and corresponding labels
  3. feature_selection.ipynb: This notebook inspects feature correlations and removes any features that are highly correlated
  4. train_model.ipynb: This notebook defines the model, trains it, and evaluates its performance on validation and test sets

The remainder of this readme will cover the different steps in the analysis pipeline.

1. Redefine Output Labels

According to the PhysioNet Challenge details, the labels for the provided data are as follows:
For sepsis patients, SepsisLabel is 1 if t≥tsepsis−6 and 0 if t<tsepsis−6
For non-sepsis patients, SepsisLabel is 0

In other words, the SepsisLabel is set to 1 six hours before the onset of sepsis. However, for the purposes of this project, sepsis only needs to be predicted one hour in advance. So the labels are redefined such that:
For sepsis patients, SepsisLabel is 1 if t≥tsepsis and 0 if t<tsepsis
For non-sepsis patients, SepsisLabel is 0

To actually realize this change, the first six values of SepsisLabel equals 1 are set to 0 for each patient’s data.

2. Window the Data