Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inefficient memory management in mostprobable.c #69

Open
matteodelucchi opened this issue Mar 8, 2024 · 1 comment
Open

Inefficient memory management in mostprobable.c #69

matteodelucchi opened this issue Mar 8, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@matteodelucchi
Copy link
Contributor

Decide if it is worth speeding up mostProbable().

For this example, it takes quite a while to run:

  # get data
  mydat <- ex5.dag.data[,-19] ## get the data - drop group variable

  # Restrict DAG
  banned<-matrix(c(
    # 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8
    0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, # b1
    1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, # b2
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, # b3
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, # b4
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, # b5
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, # b6
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, # g1
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, # g2
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, # g3
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, # g4
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, # g5
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, # g6
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, # g7
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, # g8
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, # g9
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, # g10
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, # g11
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 # g12
  ),byrow=TRUE,ncol=18)

  colnames(banned)<-rownames(banned)<-names(mydat)

  retain<-matrix(c(
    # 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, # b1
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, # b2
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, # b3
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, # b4
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, # b5
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, # b6
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, # g1
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, # g2
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, # g3
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, # g4
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, # g5
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, # g6
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, # g7
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, # g8
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, # g9
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, # g10
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, # g11
    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 # g12
  ),byrow=TRUE,ncol=18)
  ## again must set names
  colnames(retain)<-rownames(retain)<-names(mydat)

  # set distributions
  mydists<-list(b1="binomial",
                b2="binomial",
                b3="binomial",
                b4="binomial",
                b5="binomial",
                b6="binomial",
                g1="gaussian",
                g2="gaussian",
                g3="gaussian",
                g4="gaussian",
                g5="gaussian",
                g6="gaussian",
                g7="gaussian",
                g8="gaussian",
                g9="gaussian",
                g10="gaussian",
                g11="gaussian",
                g12="gaussian"
  )

  # Compute score cache
  mycache.1par <- buildScoreCache(data.df=mydat,data.dists=mydists, max.parents=1,centre=TRUE)

  # Estimate most probable DAG
  mp.dag <- mostProbable(score.cache = mycache.1par)
@matteodelucchi matteodelucchi added enhancement New feature or request question Further information is requested labels Mar 8, 2024
@matteodelucchi matteodelucchi transferred this issue from another repository Apr 9, 2024
@matteodelucchi matteodelucchi transferred this issue from furrer-lab/shuttle-abn Apr 9, 2024
@matteodelucchi matteodelucchi self-assigned this Jun 24, 2024
@matteodelucchi
Copy link
Contributor Author

Time taken to process each batch of 10,000 sets is not constant and generally increases as the function progresses. This is likely due to inefficient memory management or cache inefficiency in mostprobable.c.

> system.time({
+   # Estimate most probable DAG
+   mp.dag <- mostProbable(score.cache = mycache.1par)
+ })
Step1. completed max alpha_i(S) for all i and S
Total sets g(S) to be evaluated over: 262144
CPU time:  37.378756 secs
processed set 10000 of 262144
CPU time:   0.826163 secs
processed set 20000 of 262144
CPU time:   1.640488 secs
processed set 30000 of 262144
CPU time:   2.190864 secs
processed set 40000 of 262144
CPU time:   3.440781 secs
processed set 50000 of 262144
CPU time:   5.327864 secs
processed set 60000 of 262144
CPU time:   5.247011 secs
processed set 70000 of 262144
CPU time:   6.580417 secs
processed set 80000 of 262144
CPU time:   8.658232 secs
processed set 90000 of 262144
CPU time:  11.354120 secs
processed set 100000 of 262144
CPU time:  12.371300 secs
processed set 110000 of 262144
CPU time:  11.075589 secs
processed set 120000 of 262144
CPU time:  14.598619 secs
processed set 130000 of 262144
CPU time:  14.912340 secs
processed set 140000 of 262144
CPU time:  18.586300 secs
processed set 150000 of 262144
CPU time:  17.721026 secs
processed set 160000 of 262144
CPU time:  16.709578 secs
processed set 170000 of 262144
CPU time:  21.197434 secs
processed set 180000 of 262144
CPU time:  23.159261 secs
processed set 190000 of 262144
CPU time:  29.592996 secs
processed set 200000 of 262144
CPU time:  23.566811 secs
processed set 210000 of 262144
CPU time:  27.672812 secs
processed set 220000 of 262144
CPU time:  30.913207 secs
processed set 230000 of 262144
CPU time:  29.665142 secs
processed set 240000 of 262144
CPU time:  34.560263 secs
processed set 250000 of 262144
CPU time:  36.487387 secs
processed set 260000 of 262144
   user  system elapsed 
417.459   1.959 416.997 

This definitely needs to be addressed, but I will postpone it for C/C++ code revision in the subproject "refactorCompiledCode."

@matteodelucchi matteodelucchi changed the title Can we speed up mostProbable? Inefficient memory management in mostprobable.c Jun 27, 2024
@matteodelucchi matteodelucchi removed the question Further information is requested label Jun 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant