Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qurro crashes with ultra high dimensional readouts #329

Open
mortonjt opened this issue Jun 23, 2023 · 2 comments
Open

Qurro crashes with ultra high dimensional readouts #329

mortonjt opened this issue Jun 23, 2023 · 2 comments
Labels
optimization Making code faster or cleaner question Further information is requested

Comments

@mortonjt
Copy link

I'm trying to generate reports on some metabolomics datasets with 40K dimensions, and I'm noticing that qurro is able to load this massive datasets. Not sure if others have had experience with this, but I'm raising this issue in case to keep this in the back of our minds. Perhaps filtering is the recommended procedure for now

@mortonjt mortonjt added the question Further information is requested label Jun 23, 2023
@fedarko fedarko added the optimization Making code faster or cleaner label Jun 23, 2023
@fedarko
Copy link
Collaborator

fedarko commented Jun 24, 2023

Thank you for raising this! Just to clarify, is the Python side of Qurro crashing, or is it able to successfully create a visualization (which then crashes in the browser)? I assume it's the second, but please let me know if it's the first.

The JavaScript code ultimately tries to store the entire BIOM table's worth of information in memory in the browser, so datasets with tens of thousands of features will start to cause problems when loading these visualizations. Out of curiosity, how sparse is your table? The current codebase uses a few optimizations when preparing the visualization (e.g. we only bother storing non-zero counts in memory, which should help for super-sparse tables), but those will become less effective if a table is not very sparse (I don't remember if metabolomics tables are quite as sparse as most 16S / shotgun tables).

There are additional optimizations that should be implementable in the future as well (see here, although not all of these are very relevant to performance), but I don't have much time to actively develop the tool nowadays :( For the time being, I think the best way to handle this issue is using filtering, as you suggested: the -x / --extreme-feature-count parameter should be sufficient.

@mortonjt
Copy link
Author

mortonjt commented Jun 25, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimization Making code faster or cleaner question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants