-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Show "removed" samples from the sample plot / Give some indication of why #92
Comments
Also, certain color fields can result in samples being dropped. IMO this is a very important consideration. Right now all of the samples that are passed to the visualization are included in the "export data" output (EXCEPT for samples with an invalid log ratio, i.e. due to missing certain features in the log ratio). We should either make this consistent—and include samples with an invalid log ratio alongside samples with invalid metadata fields—OR we should only include "currently drawn" samples, I guess? In any case, this should be super explicit. |
It'd be cool to show either a tally or a percentage of samples dropped (latter idea c/o Lisa). |
This is a solid start. We'd want to also log the number of samples dropped due to having a non-numeric value on a quantitative scale (either for color or x-axis). I *think* we could do that by running some similar code in updateSamplePlotField()/updateSamplePlotScale(), and maybe adding some logic to ensure that each sample is just counted once (i.e. both null and non-numeric-metadata-on-a-quant-scale samples would be dropped), but I'm not sure. Should be doable tho :)
these are also logged w/ the NaN-drop ct. In the future, we should also keep track of how many samples were dropped in the Python side of things, so the user knows *exactly* what samples have gone where. Ideally we should be as explicit as possible with this
plan of how code for displaying this would work:
We should store lists of sample IDs that have been excluded by any of these three metrics, so that we can say in the fourth div how many samples are currently shown in the sample plot at all (similar to Emperor's top-left-corner box). We can get the total number of dropped samples by computing the union of the three "excluded samples" lists, which shouldn't take that much time. The number of shown samples is just Also, if any of the divs saying why a sample has been dropped correspond to 0 samples being dropped (e.g. all of the x-axis fields of all samples are quantitative), then hide those divs I guess. |
Progress towards #92. I guess this is ~ 1/4 of the work there.
Should be p simple to use this in other contexts
I've spent some time looking into if it'd be possible to get Vega/Vega-Lite to do some of this work for us—perhaps with the "valid" / "missing" aggregate transforms. But I haven't been able to find much documentation on how those work. However, I think we can do this by patching the underlying Vega spec. Advantages of this:
Disadvantages:
Basically, the way we could do this is similar to how the This will necessitate some work in order to ensure that—
Here's a Vega spec that shows something like this in action on the sleep apnea dataset. I'm not super sure about whether I prefer this or the manual (i.e. using custom HTML and JS) solution. Will think about it. If we'll go with the Vega solution, make sure to attribute the aforementioned scatter plot null values example in the code. |
This is tentative -- will add tests soon
Yeah, I think it's going to be easiest to do this manually (without messing around with the generated Vega spec). Added
|
"sty" --> "various small stylistic tweaks, e.g. in the updateSampleDroppedDiv() generated messages"
looks like we're approaching a solution to #92
The newly added tests focus on weird string stuff in nominal scale types. A TODO is making sure that there's parity between what getValidSamples() accepts and what Vega* accepts. From doing some playing around in the vega editor, I think that the categorical stuff we accept (incl. stuff like " ", "undefined", ...) is also accepted by Vega, but there are some things getValidSamples() doesn't accept that Vega does. Including: "Infinity" and "-Infinity" on quantitative scales (this throws off everything) "" on nominal scales (this should be the one sample metadata value that we don't display, if anything, due to Q2 metadata standards) We might be able to handle this sorta stuff by messing around in makeSamplePlot() to look at the scale type and filter values accordingly. MIIGHT be worth making another issue for that, i'm not sure. So this is getting closer, at least!
i.e. lists of dropped sample IDs instead of just the count of dropped sample IDs. Will make finishing up #92 easier
Same basic logic, just reversed. This will make implementing #92 simpler. ....why did i do it this way at the start
This closes #116 (the new metadata tests ensure this is solved). This also brings us like 90% of the way to being done with #101. What we should do is also filter out just-whitespace data, I guess? To be consistent with QIIME 2. Now, we should add tests that the Vega* filtering reflects things properly (NaNs/nulls/Infinities are being filtered). Should be doable to explicitly set this in Altair. Once we've verified that the filtering matches the behavior, we'll be done with #92 and we can merge this branch back in to master. Also I just spent like five hours making pandas play nicely with NaNs and now my head hurts This commit changed a lot of stuff. The bulk of it was making the tests adapt to the new object-values stuff, as well as testing the new functionality. Would be worth testing that differentials' indices are parsed properly, I guess.
This sums up the programming work for #92, mostly. What I want to do now is add tests, etc. to make sure RRVDisplay.updateSamplePlotFilters() is working properly before I merge this back in. Also thinking about maybe working on #66 while I'm at it. Probably a good time to take care of #85 et al. also?
This is super exciting. Should be doable to make a new JS test that loads fancy JSONs with various weird metadata vals and asserts that the DOM is being updated correctly, although I guess it isn't *super* high priority since that side of things is pretty decently unit tested already.
Might as well, I guess! TODO--beef up test
previously only the main div was getting filled in -- now all of them have some text added and the "invisible" class added, so we know that the test is actually checking something.
In the immediate future, we can add more tests that actually check that the enforced Filters match up with the getInvalidSampleIDs() decisions. That should be good enough to take care of #92, which will let us merge this back in.
Next up: fulfilling #163 so that we can move the resulting JSONs to a JS test file, and then ensure that stuff works as expected. NOTE that I also modified the integration test for q2-moving-pictures to actually register itself as a q2 integration test.
Related changes along the way: -Added tests of the newly created function (try_to_replace_line_json) -Added a json_prefix argument to replace_plot_json_definitions() and try_to_replace_line_json() that should let us set up a #92 integration test super soon. From there, all we'd need to do is actually write the JS side of the test, and... then we'd be good!
This'll let us set up an integration test for #92 and thereby let us actually FINALLY finish things up here
This should at least ensure that all the weird stuff in the SST sample metadata file makes it to JS ok. The one test I have added is essentially a copy of testing_utilities.validate_sample_stats_test_sample_plot_json(), but from the JS side of things. This is already pretty good, but we can do even better. IMO, the main thing we need to do is to test that filters created for this test's RRVDisplay object match the decisions of getInvalidSampleIDs(). Once that's done I guess we're ready to finally merge this stuff back in?
TODO -- add more (see commented out stuff). for some reason the state of rrv seems to persist between tests... I'll try to figure out a solution tomorrow.
-Now, get_jsons() has a json_prefix argument analogous to the one for replace_js_json_definitions(). This lets us "pick our battles" -- if we can't find any "var SST...JSON = {"-ish definitions in a file, we won't bother trying to replace any of them. -This was causing problems earlier because the code was constantly replacing the #92 JS integration test file because it kept finding Nones for all of the JSONs in there (since it wasn't passing the SST prefix to get_jsons()). So that was a pain. -also, code should be generally faster now. It's still kind of inefficient (because really the line numbers get_jsons() gets to should be shared with replace_...() to reduce the number of times a file is gone over), but it *works* so that's really all I care about for this particular side of things. -Also, the t[a : b] syntax apparently makes flake8 angry. So I disabled E203 in the Makefile's invocation of flake8. (I also disabled W503, which kept popping up when I disabled E203. From some cursory googling, it looks like both of these are safe to disable when using flake8 and black on the same codebase.)
Turns out the bug with "rrv" (actually with the JSON) state was because I need to parse/stringify them before passing them. whoops. (That's the sort of thing that #162 should address in the future.) So I imagine these filtering tests as being sort of like a table for a boolean operator: C/C C/Q Q/C Q/Q So far we've got C/C (both x-axis and color categorical) and, now, Q/C (x-axis quantitative and color categorical) tested. Just need to add the other two soon. After that we need to verify that getInvalidSampleIDs() matches up with these filters. Then... we're good!
After this, I just want to add comparisons to the results of getInvalidSampleIDs(). Then... we're done!
A way of ensuring that our decisions match up with the Vega* filters we create. Close to #92 being taken care of -- just gotta do this for the other three filter tests.
(We still filter out samples that are empty after -x removes their only associated features.) I figure making this explicit is sort of needed to really have #92 addressed. Just want to add a couple of tests for this...
Still gotta actually clean up Qurro code somewhat.
JS is weird, but we're handling things properly. biocore#92 (This was more I forgot how this part of the code worked and started freaking out over it being broken until i realized it wasn't broken lol)
The solution with #91 provides us with a way to ensure that, e.g., only samples with a numerical metadata value are present in the scatterplot. However, it'd be a good idea to also inform the users of how many samples were excluded due to invalid-ish values.
A really cool example of this is this Vega example, where nulls are represented as gray points along the axes. (Also see a simpler version of the example for Vega-Lite.) However, we don't necessarily need fancy solutions like these for our purposes; I'd be fine having just a little text box or something next to the sample plot that contains the number of "dropped" samples. This could be as simple as the
"type": "text"
mark on the Vega example above—just count the number of invalid values and say something like{} samples with missing values.
In a sense, this functionality would work double duty—it'd inform the user both about which samples don't have the features contained in the log ratio and about which samples don't have valid metadata for the current x-axis.
The text was updated successfully, but these errors were encountered: