Skip to content

Commit

Permalink
JOSS paper writeup (#23)
Browse files Browse the repository at this point in the history
* added markdown and bib

* Update paper.md

* Include statement of need and brief survey of other packages

* Update paper.bib

* Update paper.md

* Update paper.md

* fix minor formatting issue

* fix formatting and minor typo

* add paragraph in core ideas

* revert back to original format, giving up fix MD025...

* Update paper.md

minor edits to core idea

* add several words about append function

---------

Co-authored-by: Aleksandr Aravkin <>
Co-authored-by: Aleksandr Aravkin <saravkin@uw.edu>
  • Loading branch information
zhengp0 and saravkin committed Apr 14, 2024
1 parent c07f900 commit 94e4db9
Show file tree
Hide file tree
Showing 2 changed files with 370 additions and 0 deletions.
271 changes: 271 additions & 0 deletions paper.bib
Original file line number Diff line number Diff line change
@@ -0,0 +1,271 @@
@article{gil2024health,
title={Health effects associated with chewing tobacco: a Burden of Proof study},
author={Gil, Gabriela F and Anderson, Jason A and Aravkin, Aleksandr and Bhangdia, Kayleigh and Carr, Sinclair and Dai, Xiaochen and Flor, Luisa S and Hay, Simon I and Malloy, Matthew J and McLaughlin, Susan A and others},
journal={Nature Communications},
volume={15},
number={1},
pages={1082},
year={2024},
publisher={Nature Publishing Group UK London}
}
@article{balaj2024effects,
title={Effects of education on adult mortality: a global systematic review and meta-analysis},
author={Balaj, Mirza and Henson, Claire A and Aronsson, Amanda and Aravkin, Aleksandr and Beck, Kathryn and Degail, Claire and Donadello, Lorena and Eikemo, Kristoffer and Friedman, Joseph and Giouleka, Anna and others},
journal={The Lancet Public Health},
year={2024},
publisher={Elsevier}
}
@article{flor2024health,
title={Health effects associated with exposure to secondhand smoke: a Burden of Proof study},
author={Flor, Luisa S and Anderson, Jason A and Ahmad, Noah and Aravkin, Aleksandr and Carr, Sinclair and Dai, Xiaochen and Gil, Gabriela F and Hay, Simon I and Malloy, Matthew J and McLaughlin, Susan A and others},
journal={Nature medicine},
pages={1--19},
year={2024},
publisher={Nature Publishing Group US New York}
}
@article{spencer2023health,
title={Health effects associated with exposure to intimate partner violence against women and childhood sexual abuse: a burden of proof study},
author={Spencer, Cory N and Khalil, Mariam and Herbert, Molly and Aravkin, Aleksandr Y and Arrieta, Alejandra and Baeza, Mar{\'\i}a Jose and Bustreo, Flavia and Cagney, Jack and Calderon-Anyosa, Renzo JC and Carr, Sinclair and others},
journal={Nature medicine},
pages={1--16},
year={2023},
publisher={Nature Publishing Group US New York}
}
@article{dai2022health,
title={Health effects associated with smoking: a Burden of Proof study},
author={Dai, Xiaochen and Gil, Gabriela F and Reitsma, Marissa B and Ahmad, Noah S and Anderson, Jason A and Bisignano, Catherine and Carr, Sinclair and Feldman, Rachel and Hay, Simon I and He, Jiawei and others},
journal={Nature Medicine},
volume={28},
number={10},
pages={2045--2055},
year={2022},
publisher={Nature Publishing Group US New York}
}
@article{razo2022effects,
title={Effects of elevated systolic blood pressure on ischemic heart disease: a Burden of Proof study},
author={Razo, Christian and Welgan, Catherine A and Johnson, Catherine O and McLaughlin, Susan A and Iannucci, Vincent and Rodgers, Anthony and Wang, Nelson and LeGrand, Kate E and Sorensen, Reed JD and He, Jiawei and others},
journal={Nature Medicine},
volume={28},
number={10},
pages={2056--2065},
year={2022},
publisher={Nature Publishing Group US New York}
}
@article{stanaway2022health,
title={Health effects associated with vegetable consumption: a Burden of Proof study},
author={Stanaway, Jeffrey D and Afshin, Ashkan and Ashbaugh, Charlie and Bisignano, Catherine and Brauer, Michael and Ferrara, Giannina and Garcia, Vanessa and Haile, Demewoz and Hay, Simon I and He, Jiawei and others},
journal={Nature Medicine},
volume={28},
number={10},
pages={2066--2074},
year={2022},
publisher={Nature Publishing Group US New York}
}
@article{lescinsky2022health,
title={Health effects associated with consumption of unprocessed red meat: a Burden of Proof study},
author={Lescinsky, Haley and Afshin, Ashkan and Ashbaugh, Charlie and Bisignano, Catherine and Brauer, Michael and Ferrara, Giannina and Hay, Simon I and He, Jiawei and Iannucci, Vincent and Marczak, Laurie B and others},
journal={Nature Medicine},
volume={28},
number={10},
pages={2075--2082},
year={2022},
publisher={Nature Publishing Group US New York}
}
@article{zheng2022burden,
title={The Burden of Proof studies: assessing the evidence of risk},
author={Zheng, Peng and Afshin, Ashkan and Biryukov, Stan and Bisignano, Catherine and Brauer, Michael and Bryazka, Dana and Burkart, Katrin and Cercy, Kelly M and Cornaby, Leslie and Dai, Xiaochen and others},
journal={Nature Medicine},
volume={28},
number={10},
pages={2038--2044},
year={2022},
publisher={Nature Publishing Group US New York}
}


@book{de1978practical,
title={A practical guide to splines},
author={De Boor, Carl and De Boor, Carl},
volume={27},
year={1978},
publisher={springer-verlag New York}
}

@inproceedings{johannessen2020splipy,
title={Splipy: B-spline and NURBS modelling in python},
author={Johannessen, Kjetil Andr{\'e} and Fonn, Eivind},
booktitle={Journal of Physics: Conference Series},
volume={1669},
number={1},
pages={012032},
year={2020},
organization={IOP Publishing}
}

@article{zheng2021trimmed,
title={Trimmed constrained mixed effects models: formulations and algorithms},
author={Zheng, Peng and Barber, Ryan and Sorensen, Reed JD and Murray, Christopher JL and Aravkin, Aleksandr Y},
journal={Journal of Computational and Graphical Statistics},
volume={30},
number={3},
pages={544--556},
year={2021},
publisher={Taylor \& Francis}
}


@article{Buscemi2019Survey,
abstract = {Linear mixed-effects models are a class of models widely used for analyzing different types of data: longitudinal, clustered and panel data. Many fields, in which a statistical methodology is required, involve the employment of linear mixed models, such as biology, chemistry, medicine, finance and so forth. One of the most important processes, in a statistical analysis, is given by model_name selection. Hence, since there are a large number of linear mixed model_name selection procedures available in the literature, a pressing issue is how to identify the best approach to adopt in a specific case. We outline mainly all approaches focusing on the part of the model_name subject to selection (fixed and/or random), the dimensionality of models and the structure of variance and covariance matrices, and also, wherever possible, the existence of an implemented application of the methodologies set out.},
annote = {The most up-to-date literature review found on this issue.},
author = {Buscemi, Simona and Plaia, Antonella},
doi = {10.1007/s10182-019-00359-z},
file = {:Users/aksh/Documents/Papers/2020/Buscemi, Plaia/Model selection in linear mixed-effect models/Buscemi, Plaia - 2020 - Model selection in linear mixed-effect models.pdf:pdf},
issn = {1863-8171},
journal = {AStA Advances in Statistical Analysis},
keywords = {AIC,BIC,LASSO,Linear mixed model_name,MCP,MDL,Mixed model_name selection,Shrinkage methods},
mendeley-groups = {Feature/Effects Selection,Surveys Summaries Overviews},
month = {dec},
number = {4},
pages = {529--575},
publisher = {Springer Berlin Heidelberg},
title = {{Model selection in linear mixed-effect models}},
url = {https://doi.org/10.1007/s10182-019-00359-z http://link.springer.com/10.1007/s10182-019-00359-z},
volume = {104},
year = {2020}
}

@article{zheng2018unified,
title={A unified framework for sparse relaxed regularized regression: SR3},
author={Zheng, Peng and Askham, Travis and Brunton, Steven L and Kutz, J Nathan and Aravkin, Aleksandr Y},
journal={IEEE Access},
volume={7},
pages={1404--1423},
year={2018},
publisher={IEEE},
doi = {10.1109/ACCESS.2018.2886528}
}

@article{sholokhov2022relaxation,
title={A Relaxation Approach to Feature Selection for Linear Mixed Effects Models},
author={Sholokhov, Aleksei and Burke, James V and Santomauro, Damian F and Zheng, Peng and Aravkin, Aleksandr},
journal={arXiv preprint arXiv:2205.06925},
year={2022},
doi={10.48550/arXiv.2205.06925}
}
@article{aravkin2022relaxationb,
title={Analysis of Relaxation Methods for Feature Selection in Mixed Effects Models},
author={Aravkin, Aleksandr and Burke, James and Sholokhov, Aleksei and Zheng, Peng},
journal={arXiv preprint arXiv:2209.10575},
year={2022},
doi={10.48550/arXiv.2209.10575}
}

@article{baraldi2019basis,
title={Basis pursuit denoise with nonsmooth constraints},
author={Baraldi, Robert and Kumar, Rajiv and Aravkin, Aleksandr},
journal={IEEE Transactions on Signal Processing},
volume={67},
number={22},
pages={5811--5823},
year={2019},
publisher={IEEE},
doi={10.1109/tsp.2019.2946029}
}

@article{murray2020global,
title={Global burden of 87 risk factors in 204 countries and territories, 1990--2019: a systematic analysis for the Global Burden of Disease Study 2019},
author={Murray, Christopher JL and Aravkin, Aleksandr Y and Zheng, Peng and Abbafati, Cristiana and Abbas, Kaja M and Abbasi-Kangevari, Mohsen and Abd-Allah, Foad and Abdelalim, Ahmed and Abdollahi, Mohammad and Abdollahpour, Ibrahim and others},
journal={The Lancet},
volume={396},
number={10258},
pages={1223--1249},
year={2020},
publisher={Elsevier},
doi={10.1016/S0140-6736(20)30752-2},
}


@article{schelldorfer2014glmmlasso,
title={Glmmlasso: an algorithm for high-dimensional generalized linear mixed models using L1-penalization},
author={Schelldorfer, J{\"u}rg and Meier, Lukas and B{\"u}hlmann, Peter},
journal={Journal of Computational and Graphical Statistics},
volume={23},
number={2},
pages={460--477},
year={2014},
publisher={Taylor \& Francis},
doi={10.1080/10618600.2013.773239}
}


@article{li2020survey,
title={A survey on sparse learning models for feature selection},
author={Li, Xiaoping and Wang, Yadi and Ruiz, Rub{\'e}n},
journal={IEEE transactions on cybernetics},
year={2020},
publisher={IEEE},
doi={10.1109/TCYB.2020.2982445}
}

@article{miao2016survey,
title={A survey on feature selection},
author={Miao, Jianyu and Niu, Lingfeng},
journal={Procedia Computer Science},
volume={91},
pages={919--926},
year={2016},
publisher={Elsevier},
doi={10.1016/j.procs.2016.07.111}
}

@article{Mendible2020,
abstract = {We develop an unsupervised machine learning algorithm for the automated discovery and identification of traveling waves in spatiotemporal systems governed by partial differential equations (PDEs). Our method uses sparse regression and subspace clustering to robustly identify translational invariances that can be leveraged to build improved reduced-order models (ROMs). Invariances, whether translational or rotational, are well known to compromise the ability of ROMs to produce accurate and/or low-rank representations of the spatiotemporal dynamics. However, by discovering translations in a principled way, data can be shifted into a coordinate systems where quality, low-dimensional ROMs can be constructed. This approach can be used on either numerical or experimental data with or without knowledge of the governing equations. We demonstrate our method on a variety of PDEs of increasing difficulty, taken from the field of fluid dynamics, showing the efficacy and robustness of the proposed approach.},
archivePrefix = {arXiv},
arxivId = {1911.00565},
author = {Mendible, Ariana and Brunton, Steven L. and Aravkin, Aleksandr Y. and Lowrie, Wes and Kutz, J. Nathan},
doi = {10.1007/s00162-020-00529-9},
eprint = {1911.00565},
file = {:Users/aksh/Documents/Papers/2020/Mendible et al/Dimensionality reduction and reduced-order modeling for traveling wave physics/Mendible et al. - 2020 - Dimensionality reduction and reduced-order modeling for traveling wave physics.pdf:pdf},
issn = {14322250},
journal = {Theoretical and Computational Fluid Dynamics},
keywords = {Data decomposition,Reduced-order modeling,Transported quantities,Traveling waves},
number = {4},
pages = {385--400},
title = {{Dimensionality reduction and reduced-order modeling for traveling wave physics}},
volume = {34},
year = {2020}
}

@article{levin2019proof,
title={A Proof of Principle: Multi-Modality Radiotherapy Optimization},
author={Levin, Roman and Aravkin, Aleksandr Y and Kim, Minsun},
journal={arXiv preprint arXiv:1911.05182},
year={2019},
doi={10.48550/arXiv.1911.05182}
}

@inproceedings{sklearn_api,
author = {Lars Buitinck and Gilles Louppe and Mathieu Blondel and
Fabian Pedregosa and Andreas Mueller and Olivier Grisel and
Vlad Niculae and Peter Prettenhofer and Alexandre Gramfort
and Jaques Grobler and Robert Layton and Jake VanderPlas and
Arnaud Joly and Brian Holt and Ga{\"{e}}l Varoquaux},
title = {{API} design for machine learning software: experiences from the scikit-learn
project},
booktitle = {ECML PKDD Workshop: Languages for Data Mining and Machine Learning},
year = {2013},
pages = {108--122},
doi = {10.48550/arXiv.1309.0238}
}

@article{schelldorfer2011estimation,
title={Estimation for high-dimensional linear mixed-effects models using l1-penalization},
author={Schelldorfer, J{\"u}rg and B{\"u}hlmann, Peter and DE GEER, SARA VAN},
journal={Scandinavian Journal of Statistics},
volume={38},
number={2},
pages={197--214},
year={2011},
publisher={Wiley Online Library},
doi={10.1111/j.1467-9469.2011.00740.x}
}
99 changes: 99 additions & 0 deletions paper.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
---
title: 'xspline: A Python Package for Flexible Spline Modeling'

tags:
- Python
- Splines
- Derivatives
- Integrals
- Flexible Extraploation
- Design matrix

authors:
- name: Peng Zheng
orcid: 0000-0003-3313-215X
affiliation: 1
- name: Kelsey Maass
orcid: 0000-0002-9534-8901
affiliation: 1
- name: Aleksandr Aravkin
orcid: 0000-0002-1875-1801
affiliation: "1, 2"

affiliations:
- name: Department of Health Metrics Sciences, University of Washington
index: 1
- name: Department of Applied Mathematics, University of Washington
index: 2

date: 02.22.2024
bibliography: paper.bib

---

# Summary

Splines are a fundamental tool for describing and estimating nonlinear relationships [@de1978practical]. They allow nonlinear functions to be represented as linear combinations of spline basis elements. Researchers in physical, biological, and health sciences rely on spline models in conjunction with statistical software packages to fit and describe a vast range of nonlinear relationships.

A wide range of tools and packages exist to support modeling with splines.
These tools include
- Splipy [https://pypi.org/project/Splipy/] [@johannessen2020splipy]
- splines [https://pypi.org/project/splines/]
- spline support in scipy [https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.BSpline.html]
- pyspline [https://mdolab-pyspline.readthedocs-hosted.com/en/latest/index.html]
- splinter [https://github.com/bgrimstad/splinter]

Several important gaps remain in python packages for spline modeling. `xspline` is not a comprehensive tool that generalizes existing software. Instead, it provides key functionality that undergirds flexible interpolation and fitting, closing existing gaps in the available tools. `xspline` is currently widely used in global health applications [@murray2020global], undergidring the majority of spline modeling at the Institute of Health metrics and Evaluation (IHME).


# Statement of Need

Current spline packages offer broad functionality in spline fitting, including:
- Manipulating and estimating curves (scipy, splines), surfaces and volumes (splipy, pySpline)
- Numerical derivatives (splipy, splines, scipy, pyspline, splinter)
- Interpolation (splipy, splines, scipy, pyspline, splinter)
- Spline derivatives, antiderivaties and numerical integrals (scipy)
- Extrapolation (scipy, limited)

From this list, its apparent that `scipy` offers the most comprehensive features related to derivaties, integrals, and extrapolation. However, key limitations remain. First, while `scipy` provides derivative and anti-derivative spline objects, it still evaluates definite integrals numerically. In addition, while the first and last segments of the b-spline in `scipy` can be extrapolated, there is no option for the user to extrapolate a simpler functional form, e.g. a quadratic polynomial given a cubic spline.

This functionality is essential to risk modeling. For example, data reported by all studies focusing on risk-outcome pairs are ratios of definite integrals across different exposure intervals. Prior packages do not offer a direct way to fit spline functions to these nonlinear data, because they do not provide definite integrals of splines as spline objects. Spline derivatives are also needed to impose shape constraints on risk curves of interest. Finally, extrapolations are often required to areas with little to no data, while maintaining high-fidelity fits for regions with dense data. Theoretically, it is straightforward to extrapolate any fit of degree less than or equal to the degree of the ultimate segments (for example, using slope matching for first order, slope and curvature for second order, etc.) However, this functinoality is not available in other packages.


# Core idea and structure of `xspline`

The main idea of `xspline` is to provide a python class that allows user to
interact with basis splines, their derivatives and integrals and extrapolation
options more easily.

The computation of splines is based on basis splines (B-splines), see
[@de1978practical] for a canonical reference. Using this reference, we derived recursive relationships to
compute both derivatives and definite integrals from recursive splie relationships.

To support the spline basis computation, we also created modules that provide a
convenient interface with indicator and polynomial functions, and their
derivatives and definite integrals of any order. All of these useful functions are
bundled into a main interface class called `XFunction`, which allows the user to call
the function with a specified order, where positive order represents derivatives
and negative order represents definite integrals.

We also allow user to specify the way they want to extrapolate by
matching the smoothness at the end knots. This is achieved by a class method
of `XFunction` called `append` that will slice two instances of `XFunction`
together.

With all of the above features, we created a easy to use spline
package for statistical model building, which has been widely used in
global health statistical analysis, see references below.
For more examples please check [here](https://ihmeuw-msca.github.io/xspline/quickstart.html).

More information about the structure of the library can be found in [documentation](https://ihmeuw-msca.github.io/xspline/api_reference/),
while the mathematical use cases are extensively discussed in [@zheng2021trimmed] and [@zheng2022burden] in the context of fitting nonlinear dose-response
relationships.


# Ongoing Research and Dissemination

The `xspline` package is widely used in all spline modeling done at IHME. In paricular, the new functionality described above enabled a new set of dose-response analyses recently published by the institue, including analyses of chewing tobacco [@gil2024health], education [@balaj2024effects], second-hand smoke [@flor2024health], intimate partner violence [@spencer2023health], smoking [@dai2022health], blood pressure [@razo2022effects], vegetable consumption [@stanaway2022health], and red meat consumption [@lescinsky2022health]. The results of all of these analyses are now publicly available at https://vizhub.healthdata.org/burden-of-proof/.

# References

0 comments on commit 94e4db9

Please sign in to comment.