Skip to content
/ MedST Public

Offical code of Unlocking the Power of Spatial and Temporal Information in Medical Multimodal Pre-training[ICML 2024]

Notifications You must be signed in to change notification settings

SVT-Yang/MedST

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Unlocking the Power of Spatial and Temporal Information in Medical Multimodal Pre-training 【ICML 2024】

This is the offical code of Unlocking the Power of Spatial and Temporal Information in Medical Multimodal Pre-training.[ICML 2024]

Installation:

Clone this repository and install Python dependencies:

git clone https://github.com/SVT-Yang/MedST.git
pip install -r requirements.txt

Datasets Preparation:

Datasets we used are as follows:

  • MIMIC-CXR: MIMIC-CXR-JPG is the medical mutimodal dataset we used for pretraining.

  • MS-CXR-T benchmark:We used MS-CXR-T benchmark for temporal downstream tasks.

  • RSNA: We used the stage 2 of RSNA dataset in Kaggle.

  • COVIDx: We used the version 6 of COVIDx dataset in Kaggle which has 3 classes, i.e., no pneumonia/non-COVID-19 pneumonia/COVID-19 pneumonia.

After downloading datasets, please check if the path in constants.py is correct.

Data preprocess:

  • run mimic_cxr.py to get multi-view image-text pairs and temporal information.
  • run rsna.py and covidx.py to get train/val/test set.

Pre-training:

First, download pretrained weights we used:

  • Text encoder (BioClinicalBERT) : download pytorch_model.bin to /medst/emilyalsentzer/Bio_ClinicalBERT folder from Bio_ClinicalBERT.
  • MGCA pre-trained weights from MGCA.

Before pretraining, please make sure all the path is correct.

Then, we use this command to pretrain:

cd medst/models/medst
CUDA_VISIBLE_DEVICES=0,1 python medst_module.py --gpus 2 --strategy ddp --batch_size 10  --num_workers 8

Our pre-trained MedST can be found here.

Downstream tasks:

First, we need set the path (or ckpt_path) argument to the path of our pre-trained MedST model.

1. Temporal tasks (MS-CXR-T benchmark):
  • make sure the path of two csv files (temporal image classification and temporal sentence similarity classification) are correct.
  • run temporal_test.py to get the results.
2. Zero-shot classification on RSNA:
  • run zeroshot_RSNA.py to get the results.
3. Image classification on COVIDx:

We use --data_pct to specify the portion of training data for finetuning. To run all experiments for COVIDx classification task, we use this command:

./run_cls_covidx.sh

Acknowledgement

This work is built upon the MGCA and TCC.

Citation

@article{yang2024unlocking,
      title={Unlocking the Power of Spatial and Temporal Information in Medical Multimodal Pre-training}, 
      author={Jinxia Yang and Bing Su and Wayne Xin Zhao and Ji-Rong Wen},
      journal={arXiv preprint arXiv:2405.19654},
      year={2024}
}

About

Offical code of Unlocking the Power of Spatial and Temporal Information in Medical Multimodal Pre-training[ICML 2024]

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published