Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible Slowdown when i is an S4 Object with sapply method #1438

Closed
DCEmilberg opened this issue Nov 20, 2015 · 0 comments
Closed

Possible Slowdown when i is an S4 Object with sapply method #1438

DCEmilberg opened this issue Nov 20, 2015 · 0 comments
Assignees
Milestone

Comments

@DCEmilberg
Copy link

Example (run on 1.9.6, but based on current master branch source the issue should be present there as well):

library(Matrix)
library(data.table)
x <- Matrix(1:1e6, nrow=1e3, ncol=1e3)
colnames(x) <- rownames(x) <- 1:1e3
dt.rowInd <- data.table(sink=1:900)
system.time(dt.rowInd[1:nrow(dt.rowInd),x[as.character(sink),]])
#   user  system elapsed 
#  0.027   0.004   0.031 
system.time(dt.rowInd[,x[as.character(sink),]])
#   user  system elapsed 
#  4.801   0.075   4.878 

This appears to be caused by code starting at line 1237 of datatable.R:

      if (is.null(irows)) {
            if (is.atomic(jval)) {
                jcpy = address(jval) %in% sapply(SDenv$.SD, address) # %chin% errors when RHS is list()
                if (jcpy) jval = copy(jval)
            } else if (address(jval) == address(SDenv$.SD)) {
                jval = copy(jval)
            } else if ( length(jcpy <- which(sapply(jval, address) %in% sapply(SDenv, address))) ) {
                for (jidx in jcpy) jval[[jidx]] = copy(jval[[jidx]])
            } else if (is.call(jsub) && jsub[[1L]] == "get" && is.list(jval)) {
                jval = copy(jval) # fix for #1212
            }
        }

In particular:

sapply(jval, address) %in% sapply(SDenv, address)

Because jval in this case is an S4 object logic is such that the line above contained in the third level of the if/elseif statement is run since is.atomic(jval) is not true. Additionally, since Matrix defines an as.list method, sapply(jval, address) runs address on every single element of the Matrix, which results in a massive slow down. A possible fix might be to change line 1238 to:

if(!is.list(jval)) {
@arunsrinivasan arunsrinivasan added this to the v2.0.0 milestone Nov 26, 2015
@arunsrinivasan arunsrinivasan self-assigned this Jul 22, 2016
@mattdowle mattdowle modified the milestones: v1.9.8, v1.9.10 Nov 23, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants