Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

additional tests of pivoting logic; bootstrapped effects for mixed models #75

Merged
merged 19 commits into from
Jul 9, 2024

Conversation

palday
Copy link
Member

@palday palday commented Jul 9, 2024

The pivoting logic is very tricky and easy to get wrong, so I added a few more tests to increase our confidence in the results.

I also added an experimental method for computing effects from a MixedModelBootstrap. This may be interesting on its own (though I seriously doubt it will differ much from the vcov-based method because it's a parametric bootstrap), but it also doesn't rely on pivoting to work with rank deficient models and so provides an additional way to cross-check the pivoted vcov-vased method. (The pivoted coefficients are stored in-place as -0.0, so they just cancel out for the prediction itself, but the associated entries in the vcov matrix are NaN, which are poisonous and thus must be pivoted out.)

It also looks like there is a complex compatibility problem between certain Julia, MixedModels and StatsModels 0.6 versions, which becomes apparent in the use of the bootstrap. I'm just going to drop explicit StatsModels 0.6 testing. We still test against it via the Julia 1.6 test, which limits the MixedModels version to a version requiring StatsModels 0.6.
Otherwise, we can backport bugfixes to the 1.2 series and StatsModels 0.7 is well supported nowadays.

Copy link

codecov bot commented Jul 9, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.53%. Comparing base (d5d8fbb) to head (9c26501).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #75      +/-   ##
==========================================
+ Coverage   99.45%   99.53%   +0.07%     
==========================================
  Files           6        6              
  Lines         185      214      +29     
==========================================
+ Hits          184      213      +29     
  Misses          1        1              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@palday palday changed the title additional tests of pivoting logic additional tests of pivoting logic; bootstrapped effects for mixed models Jul 9, 2024
@palday palday marked this pull request as ready for review July 9, 2024 12:58
Project.toml Outdated Show resolved Hide resolved
Copy link
Member

@haberdashPI haberdashPI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing I'm not really clear on is the logic of the new test employing \beta: I think I'm getting mixed up about which coefficients map to what. Would be useful to spell the logic out in a comment.

@palday palday requested a review from haberdashPI July 9, 2024 15:16
Copy link
Member

@haberdashPI haberdashPI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

ext/EffectsMixedModelsExt.jl Outdated Show resolved Hide resolved
bootdf = unstack(DataFrame(boot.β), :coefname, :β)
β = Vector{Float64}(undef, ncol(bootdf) - 1)
boot_err = mapreduce(vcat, 1:nrow(bootdf)) do iter
copyto!(β, @view bootdf[iter, 2:end])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a minute to realize why it's indexed like this: you're doing unstack above. Is that easier than grouping by coefname and using the β column in the multiplication?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on why that's the case? It's not obvious to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because that's how I thought about it first, but here ya go 😄 51e1aab

copyto!(β, @view bootdf[iter, 2:end])
return X * β
end
err = map(std, eachcol(boot_err))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't boot_err only ever have one column? X * β should be a vector, unless I'm missing something.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a small error here that lead to it always be a single column, but the aggregation was correct

607780d

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking my understanding now: Because of the transpose above, boot_err is an iteration × observation matrix, so std over each column is the standard deviation across iterations for each observation. Assuming that's correct, 👍

end

reference_grid[!, lower_col] = first.(ci)
reference_grid[!, upper_col] = last.(ci)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be transformed with the link function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, yes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

level=nothing)
grid = expand_grid(design)
dv = something(eff_col, _responsename(model))
level = isnothing(level) ? 0.68 : level
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just make the level keyword argument default to 0.68. Also, can you add a comment about why 0.68?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is explained in the other methods for effects -- the nothing is chosen to match their behavior in terms of arguments accepted.

@palday palday requested a review from ararslan July 9, 2024 19:06
@palday palday merged commit 7771901 into main Jul 9, 2024
8 checks passed
@palday palday deleted the pa/mm_pivot branch July 9, 2024 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants