Skip to content

Commit

Permalink
distributed and reduced precision bootstrap (#694)
Browse files Browse the repository at this point in the history
* distributed and reduced precision bootstrap

* parallel example

* note for reduced precision

* version bump, notes

* bootstrap save and restore

* NEWS

* (approximate) equality and tests

---------

Co-authored-by: Douglas Bates <dmbates@gmail.com>
  • Loading branch information
palday and dmbates committed Jul 6, 2023
1 parent 491f7a7 commit 8d71961
Show file tree
Hide file tree
Showing 11 changed files with 399 additions and 8 deletions.
7 changes: 7 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
MixedModels v4.15.0 Release Notes
==============================
* Support for different optimization criteria during the bootstrap. [#694]
* Support for combining bootstrap results with `vcat`. [#694]
* Support for saving and restoring bootstrap replicates with `savereplicates` and `restorereplicates`. [#694]

MixedModels v4.14.0 Release Notes
==============================
* New function `profile` for computing likelihood profiles for `LinearMixedModel`. The resultant `MixedModelProfile` can be then be used for computing confidence intervals with `confint`. Note that this API is still somewhat experimental and as such the internal storage details of `MixedModelProfile` may change in a future release without being considered breaking. [#639]
Expand Down Expand Up @@ -435,3 +441,4 @@ Package dependencies
[#680]: https://github.com/JuliaStats/MixedModels.jl/issues/680
[#681]: https://github.com/JuliaStats/MixedModels.jl/issues/681
[#682]: https://github.com/JuliaStats/MixedModels.jl/issues/682
[#694]: https://github.com/JuliaStats/MixedModels.jl/issues/694
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "MixedModels"
uuid = "ff71e718-51f3-5ec2-a782-8ffcbfa3c316"
author = ["Phillip Alday <me@phillipalday.com>", "Douglas Bates <dmbates@gmail.com>", "Jose Bayoan Santiago Calderon <jbs3hp@virginia.edu>"]
version = "4.14.1"
version = "4.15.0"

[deps]
Arrow = "69666777-d1a9-59fb-9406-91d4454c9d45"
Expand Down
2 changes: 2 additions & 0 deletions docs/Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,12 @@
BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
DisplayAs = "0b91fe84-8a4c-11e9-3e1d-67c38462b6d6"
Distributed = "8ba89e20-285c-5b6f-9357-94700520ee1b"
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
FreqTables = "da1fdf0e-e0ff-5433-a45f-9bb5ff651cb1"
Gadfly = "c91e804a-d5a3-530f-b6f0-dfbca275c004"
MixedModels = "ff71e718-51f3-5ec2-a782-8ffcbfa3c316"
ProgressMeter = "92933f4c-e287-5a05-a399-4b506db050ca"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
StatsAPI = "82ae8749-77ed-4fe6-ae5f-f523153014b0"
StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
Expand Down
57 changes: 57 additions & 0 deletions docs/src/bootstrap.md
Original file line number Diff line number Diff line change
Expand Up @@ -164,3 +164,60 @@ This operation is encapsulated in a method for the `issingular` function.
```@example Main
count(issingular(samp2))
```

## Reduced Precision Bootstrap

`parametricbootstrap` accepts an optional keyword argument `optsum_overrides`, which can be used to override the convergence criteria for bootstrap replicates. One possibility is setting `ftol_rel=1e-8`, i.e., considering the model converged when the relative change in the objective between optimizer iterations is smaller than 0.00000001.
This threshold corresponds approximately to the precision from treating the value of the objective as a single precision (`Float32`) number, while not changing the precision of the intermediate computations.
The resultant loss in precision will generally be smaller than the variation that the bootstrap captures, but can greatly speed up the fitting process for each replicates, especially for large models.
More directly, lowering the fit quality for each replicate will reduce the quality of each replicate, but this may be more than compensated for by the ability to fit a much larger number of replicates in the same time.

```@example Main
t = @timed parametricbootstrap(MersenneTwister(42), 1000, m2; hide_progress=true)
t.time
```

```@example Main
optsum_overrides = (; ftol_rel=1e-8)
t = @timed parametricbootstrap(MersenneTwister(42), 1000, m2; optsum_overrides, hide_progress=true)
t.time
```

## Distributed Computing and the Bootstrap

Earlier versions of MixedModels.jl supported a multi-threaded bootstrap via the `use_threads` keyword argument.
However, with improved BLAS multithreading, the Julia-level threads often wound up competing with the BLAS threads, leading to no improvement or even a worsening of performance when `use_threads=true`.
Nonetheless, the bootstrap is a classic example of an [embarrassingly parallel](https://en.wikipedia.org/wiki/Embarrassingly_parallel) problem and so we provide a few convenience methods for combining results computed separately.
In particular, there are `vcat` and an optimized `reduce(::typeof(vcat))` methods for `MixedModelBootstrap` objects.
For computers with many processors (as opposed to a single processor with several cores) or for computing clusters, these provide a convenient way to split the computation across nodes.

```@example Main
using Distributed
using ProgressMeter
# you already have 1 proc by default, so add the number of additional cores with `addprocs`
# you need at least as many RNGs as cores you want to use in parallel
# but you shouldn't use all of your cores because nested within this
# is the multithreading of the linear algebra
@info "Currently using $(nprocs()) processors total and $(nworkers()) for work"
# copy everything to workers
@showprogress for w in workers()
remotecall_fetch(() -> coefnames(m2), w)
end
# split the replicates across the workers
# this rounds down, so if the number of workers doesn't divide the
# number of replicates, you'll be a few replicates short!
n_replicates = 1000
n_rep_per_worker = n_replicates ÷ nworkers()
# NB: You need a different seed/RNG for each worker otherwise you will
# have copies of the same replicates and not independent replicates!
pb_map = @showprogress pmap(MersenneTwister.(1:nworkers())) do rng
parametricbootstrap(rng, n_rep_per_worker, m2; optsum_overrides)
end;
# get rid of all the workers
# rmprocs(workers())
confint(reduce(vcat, pb_map))
```
11 changes: 11 additions & 0 deletions issues/emotikon_bootstrap/Project.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
[deps]
Arrow = "69666777-d1a9-59fb-9406-91d4454c9d45"
CategoricalArrays = "324d7699-5711-5eae-9e2f-1d82baa6b597"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
Distributed = "8ba89e20-285c-5b6f-9357-94700520ee1b"
Downloads = "f43a241f-c20a-4ad4-852c-f6b1247861c6"
MixedModels = "ff71e718-51f3-5ec2-a782-8ffcbfa3c316"
ProgressMeter = "92933f4c-e287-5a05-a399-4b506db050ca"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
Scratch = "6c6a2e73-6563-6170-7368-637461726353"
StandardizedPredictors = "5064a6a7-f8c2-40e2-8bdc-797ec6f1ae18"
22 changes: 22 additions & 0 deletions issues/emotikon_bootstrap/cache.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
module Cache
using Downloads
using Scratch
# This will be filled in inside `__init__()`
download_cache = ""
url = "https://github.com/RePsychLing/SMLP2022/raw/main/data/fggk21.arrow"
#"https://github.com/bee8a116-0383-4365-8df7-6c6c8d6c1322"

function data_path()
fname = joinpath(download_cache, basename(url))
if !isfile(fname)
@info "Local cache not found, downloading"
Downloads.download(url, fname)
end
return fname
end

function __init__()
global download_cache = get_scratch!(@__MODULE__, "downloaded_files")
return nothing
end
end
82 changes: 82 additions & 0 deletions issues/emotikon_bootstrap/emotikon_bootstrap.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
include("cache.jl")
using .Cache
using Arrow
using CategoricalArrays
using DataFrames
using Distributed
using MixedModels
using ProgressMeter
using Random
using StandardizedPredictors

kb07 = MixedModels.dataset(:kb07)
contrasts = Dict(:item => Grouping(),
:subj => Grouping(),
:spkr => EffectsCoding(),
:prec => EffectsCoding(),
:load => EffectsCoding())
m07 = fit(MixedModel,
@formula(
1000 / rt_raw ~
1 + spkr * prec * load +
(1 + spkr * prec * load | item) +
(1 + spkr * prec * load | subj)
),
kb07; contrasts, progress=true, thin=1)

pbref = @time parametricbootstrap(MersenneTwister(42), 1000, m07);
pb_restricted = @time parametricbootstrap(
MersenneTwister(42), 1000, m07; optsum_overrides=(; ftol_rel=1e-3)
);
pb_restricted2 = @time parametricbootstrap(
MersenneTwister(42), 1000, m07; optsum_overrides=(; ftol_rel=1e-6)
);
confint(pbref)
confint(pb_restricted)
confint(pb_restricted2)

using .Cache
using Distributed
addprocs(3)
@everywhere using MixedModels, Random, StandardizedPredictors
df = DataFrame(Arrow.Table(Cache.data_path()))

transform!(df, :Sex => categorical, :Test => categorical; renamecols=false)
recode!(df.Test,
"Run" => "Endurance",
"Star_r" => "Coordination",
"S20_r" => "Speed",
"SLJ" => "PowerLOW",
"BPT" => "PowerUP")
df = combine(groupby(df, :Test), :, :score => zscore => :zScore)
describe(df)

contrasts = Dict(:Cohort => Grouping(),
:School => Grouping(),
:Child => Grouping(),
:Test => SeqDiffCoding(),
:Sex => EffectsCoding(),
:age => Center(8.5))

f1 = @formula(
zScore ~
1 + age * Test * Sex +
(1 + Test + age + Sex | School) +
(1 + Test | Child) +
zerocorr(1 + Test | Cohort)
)
m1 = fit(MixedModel, f1, df; contrasts, progress=true, thin=1)

# copy everything to workers
@showprogress for w in workers()
remotecall_fetch(() -> coefnames(m1), w)
end

# you need at least as many RNGs as cores you want to use in parallel
# but you shouldn't use all of your cores because nested within this
# is the multithreading of the linear algebra
# 5 RNGS and 10 replicates from each
pb_map = @time @showprogress pmap(MersenneTwister.(41:45)) do rng
parametricbootstrap(rng, 100, m1; optsum_overrides=(; maxfeval=300))
end;
@time confint(reduce(vcat, pb_map))
3 changes: 3 additions & 0 deletions src/MixedModels.jl
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,9 @@ export @formula,
weights,
zerocorr

# TODO: move this to the correct spot in list once we've decided on name
export savereplicates, restorereplicates

"""
MixedModel
Expand Down
Loading

2 comments on commit 8d71961

@palday
Copy link
Member Author

@palday palday commented on 8d71961 Jul 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/86991

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a v4.15.0 -m "<description of version>" 8d7196121e18010d75eb1d73ab49c0c345e6fa0e
git push origin v4.15.0

Please sign in to comment.