Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

weight values for the features. #96

Open
norci opened this issue May 24, 2019 · 2 comments
Open

weight values for the features. #96

norci opened this issue May 24, 2019 · 2 comments

Comments

@norci
Copy link
Contributor

norci commented May 24, 2019

I want to assign weight values for the features.
that the split function should use the features with larger weight first.

For some of the features are more important than the others, and I want to control the split process.

@bensadeghi
Copy link
Member

The split routines already identify which features have the most predictive power (information gain) via Shannon entropy. So IMO, manually identifying/defining which features are of high importance is unnecessary, and I don't know of any DT implementation out there supporting this capability.
But if you have an implementation in mind for this, we'd be happy to consider it.

@norci
Copy link
Contributor Author

norci commented Jun 18, 2019

My reason for a weighted features:

My application is using a decision tree, it's output is a function. This function will be called in the subsequent code, so there are special rules for the functions.
That is, rule 1 for some functions, rule 2 for some other functions, ...

The problem is, the decision tree model is not able to learn these rules.
I guess it's caused by the amount of data are not the same for these rules.

I did not modify the code in DecisionTree.jl, but used a simple workaround.
steps:

  1. draw a flow chart, list the nodes which need to be created explicitly.
  2. create logic rules for each node. such as (Feature A >0.5 && B <100).
  3. create sub datasets, by filter the dataset with above logic rules,
  4. build trees for each sub dataset.
  5. create the entire tree by calling "Node{Float64,String}()" recursively.

I feel this is an ugly solution.

I'm not a ML expert. Could you tell me if this feature is valuable? I mean will anyone else use this feature?
If so, I'll implement it in DecisionTree.jl, when I have free time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants