Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JOSS review: (OPTIONAL DEVELOPMENT!) Add parallel processing to speed up get_sites()? #33

Open
kanishkan91 opened this issue Jul 30, 2023 · 1 comment

Comments

@kanishkan91
Copy link

The get_sites() function is used to derive site specific info. It seems to call an API function. It can take very long to get multiple sites at once especially if the user provides an age filter.

Authors can consider just adding a parallelization here since this is just a fetch function. This could be as simple as below,

` library(parallel)
library(data.table)
#create cores and save to a cluster object

list_of_sites <- c(A,B,C)

parLapply(cluster,
list_of_sites)->processed_sites

processed_sites <- rbindlist(processed sites)

`

Above is a crude example, but can work well. Again just an optional suggestion

openjournals/joss-reviews#5561

@SimonGoring
Copy link
Contributor

With respect to parallelization, this is a great suggestion, however, at present the R package depends on a while loop to parse through the set of results (based on paging through offset/limit values).

Because of this, we send off smaller calls to get_sites() requesting 50 sites at a time until a result-set comes in empty (the case when the offset > the result set).

The Neotoma API does not, at present, provide a method for understanding the size of the result set, and so there's not a simple way to solve this problem across the set of queries passed to the API servers.

My sense is that there are ways around this (possibly using the future package?) but at present I don't think there's a simple solution.

That said, it might be easier to do this for the build_sites() methods (for example), and we'll look into it, but I'd like to put this option beyond the scope of the current review for JOSS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants