Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MixedModel Equivalent to GLM's (mod.mm.assign) #661

Closed
HiramTheHero opened this issue Dec 27, 2022 · 8 comments
Closed

MixedModel Equivalent to GLM's (mod.mm.assign) #661

HiramTheHero opened this issue Dec 27, 2022 · 8 comments

Comments

@HiramTheHero
Copy link

How can I derive an equivalent output of a MixedModel like GLM's model.mm.assign property?

I'm having a hard time finding it in the documentation and source code. I apologize if the solution is rather simple.

Thank you.

@palday
Copy link
Member

palday commented Dec 27, 2022

What are you trying to accomplish? There is nothing in MixedModels.jl that directly corresponds to StatsModels.ModelMatrix.assign. Depending on what you're trying to accomplish, I can potentially provide better guidance.

@HiramTheHero
Copy link
Author

I'm creating a function to calculate Variance Inflation Factors for the GLM package(JuliaStats/GLM.jl#428). The code I am creating with can also work with Mixed Models. However, to properly assess factored variables, I need something similar to the model.mm.assign that GLM provides for Mixed Models.

Thank you for your time and help.

@palday
Copy link
Member

palday commented Dec 27, 2022

There is already a VIF implementation for mixed models: https://palday.github.io/MixedModelsExtras.jl/stable/api/#MixedModelsExtras.vif.

The implementation there would also work for GLM.jl models with an appropriate method for termnames. Indeed the type restriction for both vif and gvif is written as RegressionModel, which GLM.jl models are a subtype of. Realistically termnames needs to be upstreamed to StatsModels.jl as

function termnames(f::FormulaTerm)
# copy body from MixedModelsExtras.jl
end

termnames(model::RegressionModel) = termnames(formula(model))

and vif and gvif should be stubbed (and thus owned by) StatsAPI.jl. Before StatsAPI was split off from StatsBase, I had started working on this (https://github.com/JuliaStats/StatsBase.jl/pull/698/files), but we decided that the best option was to try it out in a end-user package and work out the API making the API part of StatsBase/StatsAPI. It's a lot easier to have a breaking change+release in MixedModelsExtras than in StatsBase/StatsAPI because there's less of an impact on downstream packages.

cc: @kleinschmidt and @nalimilan

@nalimilan
Copy link
Member

Oops, sorry for not realizing that we had discussed this before. For some reason GitHub didn't add a link to JuliaStats/StatsBase.jl#698 at JuliaStats/GLM.jl#428, despite the mention.

@HiramTheHero Is the implementation in MixedModelsExtras similar to what you wanted to do based on the car R package?

Anyway it seems we should add empty function definitions to StatsAPI, and termnames definitions in StatsModels. Then vif and gvif can either be defined separately in GLM.jl and in MixedModelsExtras.jl, or via a single common definition in StatsModels.jl (but @kleinschmidt is hesitant about the latter solution so we could choose the former, at least as a first step).

@HiramTheHero
Copy link
Author

@HiramTheHero Is the implementation in MixedModelsExtras similar to what you wanted to do based on the car R package?

The vif function I created creates the same results as the car R package. The gvif function in MixedModelsExtras uses the same algorithm created by John Fox. (Which is what I used in my function.) The vif function in MixedModelsExtras uses a different/simpler algorithm that doesn't take into account non-factored variables correctly. (Which is what the gvif function is for to my understanding.)

In the example below, my function is just the vif function. Here is a comparison between @palday's work and mine. In my opinion, them seem similar enough.

julia> MOD = lm(@formula(SepalLength ~ SepalWidth + PetalLength + PetalWidth), iri
s)
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.Dens
ePredChol{Float64, CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Mat
rix{Float64}}

SepalLength ~ 1 + SepalWidth + PetalLength + PetalWidth

Coefficients:
─────────────────────────────────────────────────────────────────────────
                 Coef.  Std. Error      t  Pr(>|t|)  Lower 95%  Upper 95%
─────────────────────────────────────────────────────────────────────────
(Intercept)   1.856      0.250777    7.40    <1e-11   1.36038    2.35162
SepalWidth    0.650837   0.0666474   9.77    <1e-16   0.519119   0.782555
PetalLength   0.709132   0.0567193  12.50    <1e-24   0.597035   0.821229
PetalWidth   -0.556483   0.127548   -4.36    <1e-04  -0.808561  -0.304404
─────────────────────────────────────────────────────────────────────────

julia> MOD2 = lm(@formula(SepalLength ~ SepalWidth + PetalLength + PetalWidth + Sp
ecies), iris)
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.Dens
ePredChol{Float64, CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Mat
rix{Float64}}

SepalLength ~ 1 + SepalWidth + PetalLength + PetalWidth + Species

Coefficients:
──────────────────────────────────────────────────────────────────────────────────
                         Coef.  Std. Error      t  Pr(>|t|)  Lower 95%   Upper 95%
──────────────────────────────────────────────────────────────────────────────────
(Intercept)           2.17127    0.279794    7.76    <1e-11   1.61823    2.7243
SepalWidth            0.495889   0.0860699   5.76    <1e-07   0.325765   0.666013
PetalLength           0.829244   0.0685276  12.10    <1e-22   0.693794   0.964694
PetalWidth           -0.315155   0.151196   -2.08    0.0389  -0.614005  -0.0163054
Species: versicolor  -0.723562   0.240169   -3.01    0.0031  -1.19827   -0.24885
Species: virginica   -1.0235     0.333726   -3.07    0.0026  -1.68313   -0.363863
──────────────────────────────────────────────────────────────────────────────────

julia> vif(MOD).vif
3-element Vector{Float64}:
  1.2708148956298828
 15.097572326660156
 14.234334945678711

julia> MixedModelsExtras.vif(MOD)
3-element Vector{Float64}:
  1.2708149293446902
 15.097572322916154
 14.234334971742324

julia> vif(MOD2).gvif
4-element Vector{Float32}:
  2.2274659
 23.161648
 21.0214
 40.039177

julia> MixedModelsExtras.gvif(MOD2, scale = false)
4-element Vector{Float64}:
  2.2274657532417894
 23.16164777422666
 21.02140052819936
 40.039177290459335

julia> vif(MOD2).gvifModified
4-element Vector{Float32}:
 1.4924697
 4.812655
 4.58491
 2.5154824

julia> MixedModelsExtras.gvif(MOD2, scale = true)
4-element Vector{Float64}:
 1.4924696825201473
 4.812654961061167
 4.584910089434619
 2.515482418758807

Just to input my 2 cents on the matter. Whatever is used going forward, I would suggest either implementing one function that calculates a VIF or GVIF based on the if a model contains factored variables or not, or have explicit warnings to use gvif if the vif function is used on a model with factored variables.

@nalimilan
Copy link
Member

OK, great. Maybe we could have vif return the GVIF by default unless a gvif=false argument is passed. What do you think @palday?

I don't like the approach of printing warnings when vif if called on a model with categorical predictors. Warnings are annoying and don't make a lot of sense: either the result is legit and no warning should be printed, or it's not and we shouldn't return it at all.

@palday
Copy link
Member

palday commented Feb 1, 2023

@nalimilan I have a slight preference to have gvif and vif as separate functions, much like r2 and adjr2 are. We don't have to issue the warning for categorical models with vif -- there are cases (e.g. non orthogonal contrasts) where it's interesting to see the VIF for individual coefficients within a single categorical term. For GLM.jl, there's also the question of how gvif would work for models fit with the matrix interface (my preference: error).

As currently implemented, gvif depends on termnames, which probably needs to live in / be owned by StatsModels.

@palday
Copy link
Member

palday commented Jul 18, 2024

I think the MixedModels portion of this is complete and the necessary stubs have been upstreamed to StatsAPI/StatsModels. Anything else is a GLM.jl issue, so I'm closing this. 😄

@palday palday closed this as completed Jul 18, 2024
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants