Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forest Construction Hangs #146

Open
Yesse42 opened this issue Oct 22, 2021 · 1 comment
Open

Forest Construction Hangs #146

Yesse42 opened this issue Oct 22, 2021 · 1 comment

Comments

@Yesse42
Copy link

Yesse42 commented Oct 22, 2021

Hello. I was attempting to build a decision tree using Float32 data and construction seems to hang for certain input data. Here's a small example.

using DecisionTree

#Tree construction does not seem to hang for Float64s with this array, but does for Float32s and 16s.
#I have had hangs with Float64's with different data.
indep=Float32.([ 9.4  9.4   1.1
9.4  9.4  -0.0
9.4  9.4   1.9
9.4  9.4   1.4
9.4  9.4   1.1
9.4  9.4   0.0])

dep=Float32.([ -0.4
-0.2
-1.1
 0.0
 0.0
 0.0])

#The decision tree construction hangs for 9.4, -1.0, and 15.6, but not for 2.0 or 2.5??
indep[indep.≈9.4] .= 15.6

display(dep)
display(indep)

#This occasionlly doesn't hang the first time, but it has always done so on the second run.
build_forest(dep, indep, size(indep, 2), 10, 0.7)

When I managed to keyboard interrupt this in the REPL it seemed to be getting stuck in some threading situation.

Here are the versions+hardware I'm using
DecisionTree: v0.10.11
Julia: v"1.6.3" for Intel Mac (downloaded as a binary from the Julia Website) running through Rosetta 2
Computer: MacBookAir with M1 Chip.

I also downloaded Julia1.7 for Intel and AArch64, and got the same hang.

@Yesse42
Copy link
Author

Yesse42 commented Oct 23, 2021

I got around to looking at what the problem was in VSCode's debugger. The problem does not seem to be multithreading, as I removed all Threads.@threads and it persisted. Instead, it appears that the tree's depth somehow continues to grow indefinitely, with new splits only to the left. Using the same matrices as above, this is one of the resulting trees which would cause a hang if I did not set max_depth at 10. Each time the tree where this malfunction occurs is different; sometimes it doesn't happen at all.

julia> forest = build_forest(dep, indep, size(indep, 2), 4, 0.7, 10); forest.trees
4-element Vector{Union{Leaf{Float32}, Node{Float32, Float32}}}:
 Decision Tree
Leaves: 2
Depth:  1
 Decision Tree
Leaves: 2
Depth:  1
 Decision Tree
Leaves: 11
Depth:  10
 Decision Leaf
Majority: 0.0
Samples:  4

julia> pathological=forest.trees[3]; print_tree(pathological)
Feature 3, Threshold 0.0
L-> Feature 3, Threshold 0.0
    L-> Feature 3, Threshold 0.0
        L-> Feature 3, Threshold 0.0
            L-> Feature 3, Threshold 0.0
                L-> Feature 3, Threshold 0.0
                    L-> Feature 3, Threshold 0.0
                        L-> Feature 3, Threshold 0.0
                            L-> Feature 3, Threshold 0.0
                                L-> Feature 3, Threshold 0.0
                                    L-> 0.0 : 2/3
                                    R-> 0.0 : 0/0
                                R-> 0.0 : 0/0
                            R-> 0.0 : 0/0
                        R-> 0.0 : 0/0
                    R-> 0.0 : 0/0
                R-> 0.0 : 0/0
            R-> 0.0 : 0/0
        R-> 0.0 : 0/0
    R-> 0.0 : 0/0
R-> -1.1 : 1/1

It seems to be endlessly splitting to the left on feature 3 at the same threshold of 0 each time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant