Some minor differences in random forest implementations #160

tecosaur · 2022-05-11T12:05:37Z

I've been comparing some random forest implementations recently (https://github.com/tecosaur/TreeComparison), one of the results of which is #159, but I also have some other information which may be of interest.

For starters, here's the colour coding I use:

Error rates mostly converged among the different implementations I tested, however sometimes ranger does a little bit better:

Precision-recall and ROC curves generally look near-identical, as they should.

I've also noticed some larger differences in the depth and size of the random trees created. Across a number of datasets DecisionTrees.jl and randomForest produce narrower/deeper trees than ranger and sklearn.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some minor differences in random forest implementations #160

Some minor differences in random forest implementations #160

tecosaur commented May 11, 2022

Some minor differences in random forest implementations #160

Some minor differences in random forest implementations #160

Comments

tecosaur commented May 11, 2022