-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qurro crashes with ultra high dimensional readouts #329
Comments
Thank you for raising this! Just to clarify, is the Python side of Qurro crashing, or is it able to successfully create a visualization (which then crashes in the browser)? I assume it's the second, but please let me know if it's the first. The JavaScript code ultimately tries to store the entire BIOM table's worth of information in memory in the browser, so datasets with tens of thousands of features will start to cause problems when loading these visualizations. Out of curiosity, how sparse is your table? The current codebase uses a few optimizations when preparing the visualization (e.g. we only bother storing non-zero counts in memory, which should help for super-sparse tables), but those will become less effective if a table is not very sparse (I don't remember if metabolomics tables are quite as sparse as most 16S / shotgun tables). There are additional optimizations that should be implementable in the future as well (see here, although not all of these are very relevant to performance), but I don't have much time to actively develop the tool nowadays :( For the time being, I think the best way to handle this issue is using filtering, as you suggested: the |
Yes, it crashes in the browser.
Metabolomics data is not sparse, so it likely isn't able to leverage these
optimizations. So perhaps filtering is the way to go in the immediate
future.
…On Fri, Jun 23, 2023 at 8:16 PM Marcus Fedarko ***@***.***> wrote:
Thank you for raising this! Just to clarify, is the Python side of Qurro
crashing, or is it able to successfully create a visualization (which then
crashes in the browser)? I assume it's the second, but please let me know
if it's the first.
The JavaScript code ultimately tries to store the entire BIOM table's
worth of information in memory in the browser, so datasets with tens of
thousands of features will start to cause problems when loading these
visualizations. Out of curiosity, how sparse is your table? The current
codebase uses a few optimizations when preparing the visualization (e.g. we
only bother storing non-zero counts in memory, which should help for
super-sparse tables), but those will become less effective if a table is
not very sparse (I don't remember if metabolomics tables are quite as
sparse as most 16S / shotgun tables).
There are additional optimizations that should be implementable in the
future as well (see here
<https://github.com/biocore/qurro/labels/optimization>, although not all
of these are very relevant to performance), but I don't have much time to
actively develop the tool nowadays :( For the time being, I think the best
way to handle this issue is using filtering, as you suggested: the -x /
--extreme-feature-count parameter should be sufficient.
—
Reply to this email directly, view it on GitHub
<#329 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA75VXKRU3EZNPRHP23XUV3XMYWVDANCNFSM6AAAAAAZRSBEVM>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
I'm trying to generate reports on some metabolomics datasets with 40K dimensions, and I'm noticing that qurro is able to load this massive datasets. Not sure if others have had experience with this, but I'm raising this issue in case to keep this in the back of our minds. Perhaps filtering is the recommended procedure for now
The text was updated successfully, but these errors were encountered: