Resource Optimization for Radiology
MRIdle deals with patient data, therefore we work on a dedicated machine ("Louisa") which is managed by USZ.
- If you don't have a regular USZ account, get one now.
- Get your account on Louisa: See Notion for instructions for how to access Louisa.
- Log onto Windows on an USZ machine or via remote desktop (mypc.usz.ch).
- Connect to Louisa through SSH: Open PuTTY (type "putty" in start menu), enter
Louisa's IP address
(see Notion page) as the host name and press "open". - Now log in using your
ACC
account information. - Optional: you can now set a new password on this linux machine with the command
passwd
.
- Install Miniconda:
cp /tmp/Miniconda3-latest-Linux-x86_64.sh . bash Miniconda3-latest-Linux-x86_64.sh
- Create a MRIdle python environment:
conda create --name mridle python=3.8
- Activate the environment:
conda activate mridle
git clone
the MRIdle repo into your home directory via HTTPS:Note: you will have to use GitHub HTTPS authentication with a Personal Access Tokengit clone https://github.com/uzh-dqbm-cmi/mridle.git
- Move into the mridle directory:
cd mridle
- Install the package and it's requirements:
pip install -r src/requirements.txt
- Ask someone in the team to assign you a port for running Jupyter notebooks.
- Connect your dedicated port to your localhost:8888 port using
ssh -N -L localhost:8888:localhost:your-port your-acc-username@louisa-ip-address
in the Windows command linecmd
. Alternatively save this command in a.bat
file. - Start Jupyter lab through kedro in order to access kedro functionality:
Note: you must run this command from the top level
kedro jupyter lab /data/mridle/
mridle
repo directory. - In your browser, go to
localhost:8888
to open Jupyter. - In a notebook, run the following code to import the mridle module. This code snippet also activates the autoreload IPython magic so that the module automatically updates with any code changes you make.
%load_ext autoreload %autoreload 2 import mridle
- Do not delete anything.
- Patient data, even anonymised, always stays on Lousia.
- Naming convention for jupyter notebooks:
number_your-initials_short-description
.
MRIdle uses Kedro for organizing the data pipelines. Kedro is an open-source Python framework for creating reproducible, maintainable and modular data science code. It borrows concepts from software engineering best-practice and applies them to machine-learning code.
Here is a high level overview of this repo's kedro project structure (adapted from this Kedro doc page):
- The
conf/
directory contains configuration for the project, including:base/catalog/*.yml
contain data catalog entries for all data files that are involved in the pipelines.base/parameters.yml
is where parameters for pipelines is stored, for example model training parameters.
- The
src/
directory contains the source code for the project, including:mridle/
is the package directory, and contains:- The
pipelines/
directory, which contains the source code for your pipelines. - The
utiltities/
directory contains source code that is shared across multiple pipelines, or is independent from pipelines. pipeline_registry.py
file defines the project pipelines, i.e. pipelines that can be run using kedro run --pipeline.
- The
tests/
is where the tests gorequirements.in
contains the source requirements of the project.
pyproject.toml
identifies the project root by providing project metadata.
Kedro organizes data transformation steps into pipelines.
The easiest way to explore the pipelines is via Kedro's visualization tool, which you can open by running kedro viz
and opening the webapp in your browser.
Below is a short summary of some of the Kedro functionality you may use to work with MRIdle. You can read much more in the Kedro documentation!
To run a pipeline on the command line, run
kedro run --pipeline "<pipeline name>"
You can also specify which nodes to start from or stop at:
kedro run --pipeline "<pipeline name>" --from-nodes "<nodename>"
You can also interact with kedro via Jupyter and IPython sessions.
To start a Jupyter or IPython session with kedro activated, run kedro jupyter lab /data/mridle/
or kedro ipython
from within the mridle
directory.
Running Jupyter and IPython via kedro grants you access to 3 kedro variables:
catalog
: Load data created by kedro pipelinescontext
: Access information about the pipelinessession
: Run pipelines (if at any point you want to refresh these variables with changes you've made, run%reload_kedro
)
The Kedro data catalog makes loading data files from the pipelines easy:
slot_df = catalog.load('slot_df')
With this method, you can load any file defined in the Data Catalog defined in conf/base/catalog.yml
Here are some example commands for running pipelines within Jupyter/IPython:
session.run(pipeline_name='harvey')
session.run(ppipeline_name='harvey', from_nodes=['train_harvey_model')
Add requirements to src/requirements.in
(not requirements.txt
!)
kedro build-reqs
pip install -r src/requirements.txt
status_df
contains the columns:
column name | type | description |
---|---|---|
FillerOrderNo | int | appointment id |
MRNCmpdId | object | patient id |
date | datetime | the date and time of the status change |
was_status | category | the status the appt changed from |
now_status | category | the status the appt changed to |
was_sched_for | float | number of days ahead the appt was sched for before status change relative to date |
now_sched_for | int | number of days ahead the appt is sched for after status change relative to date |
was_sched_for_date | datetime | the date the appt was sched for before status change |
now_sched_for_date | datetime | the date the appt is sched for after status change |
patient_class_adj | object | patient class (adjusted) ['ambulant', 'inpatient'] |
NoShow | bool | [True, False] |
NoShow_severity | object | ['hard', 'soft'] |
slot_outcome | object | ['show', 'rescheduled', 'canceled'] |
slot_type | object | ['no-show', 'show', 'inpatient'] |
slot_type_detailed | object | ['hard no-show', 'soft no-show', 'show', 'inpatient'] |
slot_df
contains the columns:
column name | type | description |
---|---|---|
FillerOrderNo | int | appointment id |
MRNCmpdId | object | patient id |
start_time | datetime | appt scheduled start time |
end_time | datetime | appt scheduled end time |
NoShow | bool | [True, False] |
slot_outcome | object | ['show', 'rescheduled', 'canceled'] |
slot_type | object | ['no-show', 'show', 'inpatient'] |
slot_type_detailed | object | ['hard no-show', 'soft no-show', 'show', 'inpatient'] |
EnteringOrganisationDeviceID | object | device the appt was scheduled for |
UniversalServiceName | object | the kind of appointment |
To look at an example appointment history:
fon = 5758396
appt = mridle.utilities.exploration_utilities.view_status_changes(status_df, fon)
display(appt[SHOW_COLS])
To look at a random example No Show appointment:
for i in range(50):
appt = mridle.utilities.exploration_utilities.view_status_changes_of_random_sample(status_df)
if appt['NoShow'].max() == 0:
continue
else:
display(appt[SHOW_COLS])
break
altair
plotting
import altair as alt
alt.renderers.enable('default')
# the altair plot needs no-show end times set (by default they're NAT)
slot_df['end_time'] = slot_df.apply(mridle.data_management.set_no_show_end_times, axis=1)
mridle.utilities.plotting_utilities.alt_plot_date_range_for_device(slot_df, 'MR1', end_date='04/17/2019')
# you can also highlight just one kind of appointment
mridle.utilities.plotting_utilities.alt_plot_date_range_for_device(slot_df, 'MR1', end_date='04/17/2019', highlight='no-show')
matplotlib
plotting
Plot a day:
%matplotlib inline
year = 2019
month = 1
day = 14
mridle.plotting_utilities.plot_a_day(slot_df, year, month, day, labels=False, alpha=0.5)
Plot a day for one device:
mridle.utilities.plotting_utilities.plot_a_day_for_device(slot_df, 'MR-N1', year, month, day, labels=True, alpha=0.5)
mridle
contains a test suite for validating the data pipelines, including the no-show identification algorithm.
Run the tests by navigating to the top level mridle
directory and running kedro test
.