Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove empty *features* from the visualization? #171

Closed
5 tasks done
fedarko opened this issue Jul 2, 2019 · 1 comment
Closed
5 tasks done

Remove empty *features* from the visualization? #171

fedarko opened this issue Jul 2, 2019 · 1 comment
Assignees
Labels
optimization Making code faster or cleaner
Milestone

Comments

@fedarko
Copy link
Collaborator

fedarko commented Jul 2, 2019

This will probably come up eventually?

I imagine this will probably happen infrequently, but may be worth addressing. Similarly to removing empty samples, this would be done with the understanding that if all (or in this case two, I guess?) features are empty an error would be thrown.

I think that once #58 is taken care of (and _df_utils.remove_empty_samples() operates on BIOM tables), it should be possible to do this by just uh removing the axis="sample" argument to biom.Table.remove_empty(). We'd need to add some checks to make sure at least two features remain in the table post-filtering (just an extension of the current logic but on the observation axis), and we'd also need to add some logic to notify the user about dropped features (as well as document this behavior somewhere).

  • Implement this in remove_empty_samples() (and rename that function to just remove_empty() or something).
  • Notify the user about number of dropped features (analogous to current notifications re: dropped samples)
  • Throw an error when all features are empty
  • Add unit tests that this works
  • Document this
@fedarko fedarko added the optimization Making code faster or cleaner label Jul 2, 2019
@fedarko fedarko self-assigned this Jul 2, 2019
@fedarko fedarko added this to the v0.2.0 milestone Jul 2, 2019
fedarko added a commit that referenced this issue Jul 3, 2019
This is gonna be kind of inconvient to apply to the feature
ranks, also. Think I'm going to go back and try an alternative approach
to all of this stuff (#172, #171, #58) on another branch.
fedarko added a commit that referenced this issue Jul 3, 2019
Turns out this is a ton more efficient. Added bonus of now relying
on pandas' implementation of this instead of ours.

Turns out transposing huge dataframes is a pretty significant
endeavor, so calling .T on the feature table for like the EMP dataset
was taking a super long time. Fortunately, we can finesse our way
around this by instead transposing the sample metadata and then
aligning on the columns.

I'm glad that we reached a solution for this that preserved
all of the matching-up-front niceness re: testing. Solid stuff.

Uh, next up are #171 and then #58? But we can def merge this back
into master now.
fedarko added a commit that referenced this issue Jul 6, 2019
This has apparently thrown off the Byrd test, since it looks like
there were a bunch of empty features in that dataset. Need to update
the testing utilities to allow for empty features to be removed.

also looks like the JSONs for the q2-moving-pictures and sleep apnea
integration tests are different -- figure out why, and if that's due
to a bug or just due to something else. once that's done this issue will
be done
@fedarko
Copy link
Collaborator Author

fedarko commented Jul 6, 2019

Just need to reconcile this with the test system (Byrd example is causing it to fail right now), then this will be done.

@fedarko fedarko closed this as completed in 9c1ea69 Jul 6, 2019
fedarko added a commit that referenced this issue Jul 7, 2019
fedarko added a commit that referenced this issue Jul 7, 2019
relates to #58, #171.

now it's pretty quick to run the EMP with -x 2000 into Qurro. going
to do some more basic benchmarking with this.
fedarko added a commit that referenced this issue Jul 7, 2019
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimization Making code faster or cleaner
Projects
None yet
Development

No branches or pull requests

1 participant