Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Naive passing of Julia class_weight dictionary does not work #63

Closed
ablaom opened this issue Nov 29, 2023 · 3 comments
Closed

Naive passing of Julia class_weight dictionary does not work #63

ablaom opened this issue Nov 29, 2023 · 3 comments

Comments

@ablaom
Copy link
Member

ablaom commented Nov 29, 2023

At least it doesn't work for RandomForestClassifier:

using MLJ 
Forest  = @load RandomForestClassifier pkg=MLJScikitLearnInterface
w = Dict("setosa"=>0.2, "versicolor"=>0.7, "virginica"=>0.1)
forest = Forest(class_weight=w)

julia> mach = machine(forest, X, y) |> fit!
[ Info: Training machine(RandomForestClassifier(n_estimators = 100, ), ).
┌ Error: Problem fitting the machine machine(RandomForestClassifier(n_estimators = 100, ), ). 
└ @ MLJBase ~/.julia/packages/MLJBase/fEiP2/src/machines.jl:682
[ Info: Running type checks... 
[ Info: Type checks okay. 
ERROR: Python: InvalidParameterError: The 'class_weight' parameter of RandomForestClassifier must be a str among {'balanced_subsample', 'balanced'}, an instance of 'dict', an instance of 'list' or None. Got Julia:
Dict{String, Float64} with 3 entries:
  "virginica"  => 0.1
  "setosa"     => 0.2
  "versicolor" => 0.7 instead.
Python stacktrace:
 [1] validate_parameter_constraints
   @ sklearn.utils._param_validation 

Currently the struct has class_weight::Any. I wonder if it makes any difference if this is Union{Nothing, AbstractDict}.

@tylerjthomas9 Is there an easy fix for this? Workaround?

@tylerjthomas9
Copy link
Collaborator

tylerjthomas9 commented Nov 30, 2023

PythonCall.jl doesn't automatically convert Dicts to Python. You can manually do it with pydict. This weekend I will investigate a better way than manually converting them to Python dictionaries. However, the class

using MLJ 
using PythonCall
X, y = @load_iris
Forest  = @load RandomForestClassifier pkg=MLJScikitLearnInterface
w = Dict(1=>0.2, 2=>0.7, 3=>0.1)
forest = Forest(class_weight=pydict(w))
mach = machine(forest, X, y) |> fit!

One thing to note, it is that we convert the y to Int using yplain = MMI.int(y). We may want to change this so we don't have to use unnamed integers as the classes in the dictionary.

@ablaom
Copy link
Member Author

ablaom commented Nov 30, 2023

@tylerjthomas9 Thanks for the quick response and the workaround.

@ablaom
Copy link
Member Author

ablaom commented Jan 17, 2024

Closed by #64

@ablaom ablaom closed this as completed Jan 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants