FigGen @ICLR 2023

FigGen is a latent diffusion model that generates scientific figures of papers conditioned on the text from the papers (text-to-figure). We use OCR-VQGAN to project scientific figures (images) into a latent representation, and use a latent diffusion model to learn a generator. We jointly train a Bert transformer to learn text embeddings and perform text-to-figure generation.

This code is adapted from Latent Diffusion at CompVis/stable-diffusion.

Abstract

The generative modeling landscape has experienced tremendous growth in recent years, particularly in generating natural images and art. Recent techniques have shown impressive potential in creating complex visual compositions while delivering impressive realism and quality. However, state-of-the-art methods have been focusing on the narrow domain of natural images, while other distributions remain unexplored. In this paper, we introduce the problem of text-to-figure generation, that is creating scientific figures of papers from text descriptions. We present FigGen, a diffusion-based approach for text-to-figure as well as the main challenges of the proposed task. Code and models are available in this repository.

Installation

Create a conda environment named figgen, and activate it:

conda env create -f environment.yaml
conda activate figure-diffusion
pip install -e .

Download data and models

Download Paper2Fig100k dataset from Zenodo and extract it in a data folder. Download the trained models from HuggingFace and extract them in a models folder. You will need the image encoder and the diffusion model.

Modify the config files in configs/figure-diffusion/fig-gen-{...}.yaml to point to the correct paths. You must change the ckpt_path (in model.first_stage_config) and json_file (in data) with the corrsponding paths.

Training

To train the latent diffusion model from scratch, run the following command:

python main.py --config configs/figure-diffusion/fig-gen-{...}.yaml

Inference

Results

Some qualitative results of our model. We show the text description of the figure, the generated figure, and the ground truth figure. Check the paper for more results.

Todo

Automatically download Paper2Fig100k dataset (from Zenodo) and trained models (from HF)

Related work

High-Resolution Image Synthesis with Latent Diffusion Models by Rombach et al, CVPR 2022 Oral.

OCR-VQGAN: Taming Text-within-Image Generation by Rodriguez et al, WACV 2023.

Citation

If you use this code please cite the following paper:

@article{rodriguez2023figgen,
  title={FigGen: Text to Scientific Figure Generation},
  author={Rodriguez, Juan A and Vazquez, David and Laradji, Issam and Pedersoli, Marco and Rodriguez, Pau},
  journal={arXiv preprint arXiv:2306.00800},
  year={2023}
}

Contact

Juan A. Rodríguez (joanrg.ai@gmail.com).

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
configs		configs
ldm		ldm
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml
main.py		main.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FigGen @ICLR 2023

FigGen: Text to Scientific Figure Generation

Installation

Download data and models

Training

Inference

Results

Todo

Related work

Citation

Contact

About

Releases

Packages

Languages

License

joanrod/figure-diffusion

Folders and files

Latest commit

History

Repository files navigation

FigGen @ICLR 2023

FigGen: Text to Scientific Figure Generation

Installation

Download data and models

Training

Inference

Results

Todo

Related work

Citation

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages