Qurro crashes with ultra high dimensional readouts #329

mortonjt · 2023-06-23T13:21:20Z

I'm trying to generate reports on some metabolomics datasets with 40K dimensions, and I'm noticing that qurro is able to load this massive datasets. Not sure if others have had experience with this, but I'm raising this issue in case to keep this in the back of our minds. Perhaps filtering is the recommended procedure for now

fedarko · 2023-06-24T00:16:06Z

Thank you for raising this! Just to clarify, is the Python side of Qurro crashing, or is it able to successfully create a visualization (which then crashes in the browser)? I assume it's the second, but please let me know if it's the first.

The JavaScript code ultimately tries to store the entire BIOM table's worth of information in memory in the browser, so datasets with tens of thousands of features will start to cause problems when loading these visualizations. Out of curiosity, how sparse is your table? The current codebase uses a few optimizations when preparing the visualization (e.g. we only bother storing non-zero counts in memory, which should help for super-sparse tables), but those will become less effective if a table is not very sparse (I don't remember if metabolomics tables are quite as sparse as most 16S / shotgun tables).

There are additional optimizations that should be implementable in the future as well (see here, although not all of these are very relevant to performance), but I don't have much time to actively develop the tool nowadays :( For the time being, I think the best way to handle this issue is using filtering, as you suggested: the -x / --extreme-feature-count parameter should be sufficient.

mortonjt · 2023-06-25T12:50:50Z

Yes, it crashes in the browser. Metabolomics data is not sparse, so it likely isn't able to leverage these optimizations. So perhaps filtering is the way to go in the immediate future.

…

On Fri, Jun 23, 2023 at 8:16 PM Marcus Fedarko ***@***.***> wrote: Thank you for raising this! Just to clarify, is the Python side of Qurro crashing, or is it able to successfully create a visualization (which then crashes in the browser)? I assume it's the second, but please let me know if it's the first. The JavaScript code ultimately tries to store the entire BIOM table's worth of information in memory in the browser, so datasets with tens of thousands of features will start to cause problems when loading these visualizations. Out of curiosity, how sparse is your table? The current codebase uses a few optimizations when preparing the visualization (e.g. we only bother storing non-zero counts in memory, which should help for super-sparse tables), but those will become less effective if a table is not very sparse (I don't remember if metabolomics tables are quite as sparse as most 16S / shotgun tables). There are additional optimizations that should be implementable in the future as well (see here <https://github.com/biocore/qurro/labels/optimization>, although not all of these are very relevant to performance), but I don't have much time to actively develop the tool nowadays :( For the time being, I think the best way to handle this issue is using filtering, as you suggested: the -x / --extreme-feature-count parameter should be sufficient. — Reply to this email directly, view it on GitHub <#329 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA75VXKRU3EZNPRHP23XUV3XMYWVDANCNFSM6AAAAAAZRSBEVM> . You are receiving this because you authored the thread.Message ID: ***@***.***>

mortonjt added the question Further information is requested label Jun 23, 2023

fedarko added the optimization Making code faster or cleaner label Jun 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qurro crashes with ultra high dimensional readouts #329

Qurro crashes with ultra high dimensional readouts #329

mortonjt commented Jun 23, 2023

fedarko commented Jun 24, 2023

mortonjt commented Jun 25, 2023 via email

Qurro crashes with ultra high dimensional readouts #329

Qurro crashes with ultra high dimensional readouts #329

Comments

mortonjt commented Jun 23, 2023

fedarko commented Jun 24, 2023

mortonjt commented Jun 25, 2023 via email