Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Escape brackets/periods/backslashes/quotes in input rank IDs and sample metadata fields, and in any "inputs" to plot JSONs #66

Open
fedarko opened this issue Mar 1, 2019 · 6 comments
Assignees
Labels
bug Something isn't working external issues/bugs with other libraries, frameworks, etc.; might include reproducing an issue minimally

Comments

@fedarko
Copy link
Collaborator

fedarko commented Mar 1, 2019

Apparently vega treats these specially. See this page for context.

This is causing a problem with the rank names in the Byrd data example -- trying to switch to a rank that isn't "Intercept" brings up an error.

I guess we have to apply this not only for each column but for every possible string that's passed to Vega: so every feature/sample ID, augmented feature ID, sample metadata, and probably more. sheesh.

ideally we should have tests that verify that our measures to protect against Vega interpreting things wrongly work (#2).

Note: you can escape these either with a ton of backslashes or by enclosing the field names in square brackets. The latter sounds easier.

Note: related to vega/vega-lite#4965

Note: should also ensure that field names (when passed into the plot JSONs, e.g. for things like setting an encoding field of the sample plot's color or setting the encoding field of the rank plot's y-axis) are escaped in JS via something like vega.stringValue().

@fedarko fedarko added bug Something isn't working external issues/bugs with other libraries, frameworks, etc.; might include reproducing an issue minimally labels Mar 1, 2019
@fedarko fedarko self-assigned this Mar 1, 2019
@fedarko
Copy link
Collaborator Author

fedarko commented Mar 7, 2019

So I think that due to our use of json.dump(), we shouldn't have to worry about most of these aside from the Vega-Lite-specific ones (periods and brackets). But again, it's still a good idea to be sure.

@fedarko
Copy link
Collaborator Author

fedarko commented Mar 7, 2019

If we want to be 100% safe, we'll need to escape all of the following:

  • Rank IDs
  • Feature IDs
  • Sample IDs
  • Sample Metadata IDs
  • Feature Metadata IDs

In practice, I'm not sure that this is necessary for feature metadata IDs, feature IDs, or sample IDs (since I've used .s in these IDs before without issue). I think json.dump takes care of those -- the main issue seems with fields that end up being set as an axis/encoding/etc in Vega/Vega-Lite (e.g. ranks).

still worth adding lots of test cases that verify that this all works as intended.

@fedarko
Copy link
Collaborator Author

fedarko commented Mar 7, 2019

ahsdfiusdoifjsdfioj

so it looks like even if you escape a rank ID properly for the axis stuff, you still need to use the non-escaped ID in the underlying dataset???? bluhg

@fedarko
Copy link
Collaborator Author

fedarko commented Mar 7, 2019

@mortonjt small question: is preserving the patsy formulas in rank IDs (e.g. C(Timepoint, Treatment('F'))[T.B] in the Byrd data) helpful when looking at the ranks? It looks like periods, brackets, and quotes all cause problems when you pass them into Vega-Lite as field IDs.

I've implemented a basic solution that converts periods to colons and square brackets to parentheses (along with filtering out quotes and backslashes). This takes care of the problem for now, but if you think it's worth it I can come back to this later (probably after exams are over) and add back in support for some of these weird characters.

@fedarko
Copy link
Collaborator Author

fedarko commented Mar 7, 2019

note to self: if we go with the solution of filtering out/converting certain special characters in IDs, ensure that they're still unique afterwards.

@mortonjt
Copy link

mortonjt commented Mar 7, 2019 via email

@fedarko fedarko changed the title Escape brackets/periods/backslashes/quotes in input data Escape brackets/periods/backslashes/quotes in input rank IDs and sample metadata Mar 9, 2019
fedarko added a commit that referenced this issue Mar 11, 2019
Eventually I need to make the code filter out special characters
from sample metadata IDs and/or make it escape them properly, but
that's going to take some work

[ci skip]
@fedarko fedarko changed the title Escape brackets/periods/backslashes/quotes in input rank IDs and sample metadata Escape brackets/periods/backslashes/quotes in input rank IDs and sample metadata fields, and in inputs to plot JSONs Jun 12, 2019
@fedarko fedarko changed the title Escape brackets/periods/backslashes/quotes in input rank IDs and sample metadata fields, and in inputs to plot JSONs Escape brackets/periods/backslashes/quotes in input rank IDs and sample metadata fields, and in any "inputs" to plot JSONs Jun 12, 2019
fedarko added a commit that referenced this issue Jun 12, 2019
This sums up the programming work for #92, mostly. What I want to
do now is add tests, etc. to make sure
RRVDisplay.updateSamplePlotFilters() is working properly before I
merge this back in. Also thinking about maybe working on #66 while I'm
at it.

Probably a good time to take care of #85 et al. also?
fedarko added a commit that referenced this issue Jul 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working external issues/bugs with other libraries, frameworks, etc.; might include reproducing an issue minimally
Projects
None yet
Development

No branches or pull requests

2 participants