Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fit! on a LinearBinaryClassifier fails for Float32 values #42

Closed
tiemvanderdeure opened this issue Jan 31, 2024 · 3 comments
Closed

fit! on a LinearBinaryClassifier fails for Float32 values #42

tiemvanderdeure opened this issue Jan 31, 2024 · 3 comments

Comments

@tiemvanderdeure
Copy link
Contributor

tiemvanderdeure commented Jan 31, 2024

Fitting a machine with a LinearBinaryClassifier model errors if the input data has datatype Float32 or Float16.

The same regression using vanilla GLM works. A LinearRegressor also works.

A reproducible example:

using GLM, CategoricalArrays, MLJBase, MLJGLMInterface
X = (a = rand(Float32, 200), b = rand(Float32, 200))
y_categorical = X.a .+ X.b .> rand(200)

mach = MLJBase.machine(LinearBinaryClassifier(), X, categorical(y_categorical))
mach2 = MLJBase.machine(LinearRegressor(), X, X.a .+ X.b .+ rand(200))

fit!(mach) # errors
fit!(mach2) # works

GLM.glm(@formula(c ~ a + b), merge(X, (; c = (y_categorical))), Bernoulli()) # also works

Stacktrace:

[ Info: Training machine(LinearBinaryClassifier(fit_intercept = true, …), …).
┌ Error: Problem fitting the machine machine(LinearBinaryClassifier(fit_intercept = true, …), …).
└ @ MLJBase C:\Users\tsh371\.julia\packages\MLJBase\fEiP2\src\machines.jl:682
[ Info: Running type checks...
[ Info: Type checks okay.
ERROR: MethodError: no method matching GLM.GlmResp(::Vector{…}, ::Bernoulli{…}, ::LogitLink, ::Vector{…}, ::Vector{…}, ::Vector{…}, ::Vector{…})

Closest candidates are:
  GLM.GlmResp(::V, ::D, ::L, ::V, ::V, ::V, ::V, ::V, ::V, ::V) where {V<:(AbstractVector{T} where T<:AbstractFloat), D<:(Distributions.UnivariateDistribution), L<:Link}
   @ GLM C:\Users\tsh371\.julia\packages\GLM\vM20T\src\glmfit.jl:7
  GLM.GlmResp(::V, ::D, ::L, ::V, ::V, ::V, ::V) where {V<:(AbstractVector{T} where T<:AbstractFloat), D, L}
   @ GLM C:\Users\tsh371\.julia\packages\GLM\vM20T\src\glmfit.jl:28
  GLM.GlmResp(::AbstractVector{T} where T<:AbstractFloat, ::Distributions.Distribution, ::Link, 
::AbstractVector{T} where T<:AbstractFloat, ::AbstractVector{T} where T<:AbstractFloat)
   @ GLM C:\Users\tsh371\.julia\packages\GLM\vM20T\src\glmfit.jl:54
  ...

Stacktrace:
  [1] GLM.GlmResp(y::Vector{Float32}, d::Bernoulli{Float64}, l::LogitLink, off::Vector{Float64}, wts::Vector{Float64})
    @ GLM C:\Users\tsh371\.julia\packages\GLM\vM20T\src\glmfit.jl:61
  [2] GLM.GlmResp(y::Vector{Float32}, d::Bernoulli{Float64}, l::LogitLink, off::Vector{Float64}, wts::Vector{Int64})
    @ GLM C:\Users\tsh371\.julia\packages\GLM\vM20T\src\glmfit.jl:69
  [3] fit(::Type{…}, X::Matrix{…}, y::Vector{…}, d::Bernoulli{…}, l::LogitLink; dropcollinear::Bool, dofit::Bool, wts::Vector{…}, offset::Vector{…}, fitargs::@Kwargs{…})
    @ GLM C:\Users\tsh371\.julia\packages\GLM\vM20T\src\glmfit.jl:582
  [4] fit(::Type{…}, ::FormulaTerm{…}, ::Tables.MatrixTable{…}, ::Bernoulli{…}, ::Vararg{…}; contrasts::Dict{…}, kwargs::@Kwargs{…})
    @ StatsModels C:\Users\tsh371\.julia\packages\StatsModels\syVEq\src\statsmodel.jl:88        
  [5] glm(::FormulaTerm{…}, ::Tables.MatrixTable{…}, ::Bernoulli{…}, ::Vararg{…}; kwargs::@Kwargs{…})
    @ GLM C:\Users\tsh371\.julia\packages\GLM\vM20T\src\glmfit.jl:604
  [6] fit(model::LinearBinaryClassifier, verbosity::Int64, X::@NamedTuple{…}, y::CategoricalVector{…}, w::Nothing)
    @ MLJGLMInterface C:\Users\tsh371\.julia\packages\MLJGLMInterface\h8qVE\src\MLJGLMInterface.jl:419
  [7] fit(model::LinearBinaryClassifier, verbosity::Int64, X::@NamedTuple{…}, y::CategoricalVector{…})
    @ MLJGLMInterface C:\Users\tsh371\.julia\packages\MLJGLMInterface\h8qVE\src\MLJGLMInterface.jl:413
  [8] fit_only!(mach::Machine{LinearBinaryClassifier, true}; rows::Nothing, verbosity::Int64, force::Bool, composite::Nothing)
    @ MLJBase C:\Users\tsh371\.julia\packages\MLJBase\fEiP2\src\machines.jl:680
  [9] fit_only!
    @ MLJBase C:\Users\tsh371\.julia\packages\MLJBase\fEiP2\src\machines.jl:606 [inlined]
 [10] #fit!#63
    @ MLJBase C:\Users\tsh371\.julia\packages\MLJBase\fEiP2\src\machines.jl:777 [inlined]       
 [11] fit!(mach::Machine{LinearBinaryClassifier, true})
    @ MLJBase C:\Users\tsh371\.julia\packages\MLJBase\fEiP2\src\machines.jl:774
 [12] top-level scope

Some type information was truncated. Use `show(err)` to see complete types.
@rikhuijzer
Copy link
Member

rikhuijzer commented Jan 31, 2024

Hi @tiemvanderdeure. Thanks for reporting this issue. Out of curiosity, could you tell how much you would need Float32 computations? Normally, GLM is meant for semi-small datasets, so it should usually be very easy to promote the numbers to the 64-bit representation (Float32 to Float64) without serious consequences for running time.

The second situation with mach2 actually is promoted to Float64 in the background because

julia> X.a .+ X.b .+ rand(200)) |> typeof
Vector{Float64} (alias for Array{Float64, 1})

In this case, I've gave fixing this a shot in #43, but it looks not as easy as I hoped.

@tiemvanderdeure
Copy link
Contributor Author

tiemvanderdeure commented Feb 3, 2024

I discovered this error because I am working with data extracted for rasters, which is often Float32.

I don't actually need the Float32 computations, so if it's not easy to fix I'll just put in a safeguard that converts to Float64 before the data is passed to machine for now.

Sloppy on my part with the second example. So in reality, LinearRegressorerror errors as well.

using GLM, CategoricalArrays, MLJBase, MLJGLMInterface
X = (a = rand(Float32, 200), b = rand(Float32, 200))
y_categorical = X.a .+ X.b .> rand(200)

mach = MLJBase.machine(LinearBinaryClassifier(), X, categorical(y_categorical))
mach2 = MLJBase.machine(LinearRegressor(), X, X.a .+ X.b .+ rand(Float32, 200))

fit!(mach) # errors
fit!(mach2) # also errors!

@tiemvanderdeure
Copy link
Contributor Author

Solved by #45

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants