Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing DOT string with formatting in it #505

Open
s4m0r4m4 opened this issue Mar 6, 2020 · 10 comments
Open

Parsing DOT string with formatting in it #505

s4m0r4m4 opened this issue Mar 6, 2020 · 10 comments
Labels
bug Something isn't working enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@s4m0r4m4
Copy link

s4m0r4m4 commented Mar 6, 2020

Hello, I am using vis.parseDOTNetwork(sTemp); to parse this DOT-style string:

digraph Tree {
node [shape=box, style="filled, rounded", color="black", fontname=helvetica] ;
edge [fontname=helvetica] ;
0 [label=<Intensity &lt; 120 | Height (px) &le; 2.5<br/>gini = 0.158<br/>samples = 8946<br/>value = [8175, 771]<br/>class = No Defect>, fillcolor="#e78d4c"] ;
1 [label=<gini = 0.0<br/>samples = 7508<br/>value = [7508, 0]<br/>class = No Defect>, fillcolor="#e58139"] ;
0 -> 1 [labeldistance=2.5, labelangle=45, headlabel="True"] ;
2 [label=<Intensity &lt; 120 | Width (px) &le; 20.5<br/>gini = 0.497<br/>samples = 1438<br/>value = [667, 771]<br/>class = Defect>, fillcolor="#e4f2fb"] ;
0 -> 2 [labeldistance=2.5, labelangle=-45, headlabel="False"] ;
3 [label=<Intensity &lt; 120 | Area (px) &le; 22.5<br/>gini = 0.431<br/>samples = 875<br/>value = [275, 600]<br/>class = Defect>, fillcolor="#94caf1"] ;
2 -> 3 ;
4 [label=<gini = 0.371<br/>samples = 191<br/>value = [144, 47]<br/>class = No Defect>, fillcolor="#edaa7a"] ;
3 -> 4 ;
5 [label=<gini = 0.31<br/>samples = 684<br/>value = [131, 553]<br/>class = Defect>, fillcolor="#68b4eb"] ;
3 -> 5 ;
6 [label=<Intensity &lt; 120 | Height (px) &le; 4.5<br/>gini = 0.423<br/>samples = 563<br/>value = [392, 171]<br/>class = No Defect>, fillcolor="#f0b88f"] ;
2 -> 6 ;
7 [label=<gini = 0.491<br/>samples = 306<br/>value = [174, 132]<br/>class = No Defect>, fillcolor="#f9e1cf"] ;
6 -> 7 ;
8 [label=<gini = 0.257<br/>samples = 257<br/>value = [218, 39]<br/>class = No Defect>, fillcolor="#ea985c"] ;
6 -> 8 ;
}

The DOT string was created using scikit-learn's sklearn.tree.export_graphviz(...) (see https://scikit-learn.org/stable/modules/generated/sklearn.tree.export_graphviz.html#sklearn.tree.export_graphviz). I have verified that this is a valid DOT string using GraphViz Online.

When I call parseDOTNetwork , it throws the following error: "Syntax error in part "<Intensity &lt; 120 | Heigh..."" from line 7283 of vis-network.js.

It seems like vis-network is not set up to parse more complex labels with HTML formatting, but I could not find anything in the docs describing the limitations or workarounds. What do I need to do to be able to parse this DOT string?

Here are my versions from package-lock.json:

"vis-data": {
			"version": "6.2.1",
			"resolved": "https://registry.npmjs.org/vis-data/-/vis-data-6.2.1.tgz",
			"integrity": "sha512-y80h+jJ6gQXfeWqDdts9UTrLubggx99Q0dJionQF/XxHIbcxNZ+2cuwv4y+bIapVF9nW0DyE08EcMMlicAlSWg==",
			"requires": {
				"vis-util": "^1.1.0"
			}
		},
"vis-network": {
			"version": "7.3.5",
			"resolved": "https://registry.npmjs.org/vis-network/-/vis-network-7.3.5.tgz",
			"integrity": "sha512-i9Ow9qqYGhXwcTEetbKCHBTzYvjsag5xM+Zl009ajjZyAvMLqu3Gljk/VpcAn2WqQHZAz0g/Cg3+xfALxYwLGA=="
		},
"vis-util": {
			"version": "1.1.8",
			"resolved": "https://registry.npmjs.org/vis-util/-/vis-util-1.1.8.tgz",
			"integrity": "sha512-gX461BUrYnHfDpMalvQdQ26SG/DplIvsK7u5KUCWEfPpcnTXm22FtQH5a+GIbDBcePe1sm578mpbWwTbCgRVww==",
			"requires": {
				"moment": "^2.24.0",
				"vis-uuid": "^1.1.3"
			}
		},
@s4m0r4m4
Copy link
Author

Hello, I'm just following up on this - is there any restriction on what version of DOT I can use? Or are certain DOT features not supported? I don't necessarily need the code updated, I just need to know what it can support so that I can work around the features that are not supported.

@s4m0r4m4
Copy link
Author

It also looks like it does not support subgraphs:

digraph Tree { fontsize=25;
subgraph cluster_1 {
        node [color=black fontname=helvetica shape=box style="filled, rounded"]
        edge [fontname=helvetica labelangel=45 labeldistance=2.5]
        V1_0 [label="Intensity Threshold: < 170
Defect (#): 256
Okay (#): 2294
Split Gain: 0.081" fillcolor="#199750"]
        V1_0 -> V1_1 [xlabel="Area >= 967"]
        V1_1 [label="Intensity Threshold: > 50
Defect (#): 149
Okay (#): 30
Split Gain: 0.068" fillcolor="#ea5739"]
label = "View 1"
}
subgraph cluster_2 {
        node [color=black fontname=helvetica shape=box style="filled, rounded"]
        edge [fontname=helvetica labelangel=45 labeldistance=2.5]
        V2_0 [label="Intensity Threshold: > 60
Defect (#): 1049
Okay (#): 2067
Split Gain: 0.111" fillcolor="#b9e176"]
        V2_0 -> V2_1 [xlabel="Std >= 22.2"]
        V2_1 [label="Intensity Threshold: > 160
Defect (#): 818
Okay (#): 532
Split Gain: 0.048" fillcolor="#fedc88"]
label = "View 2"
}
}

@s4m0r4m4
Copy link
Author

s4m0r4m4 commented Jun 5, 2020

Another gap: It also does not render "xlabels" on edges nor nodes.

Looking at the old vis.js (which I believe this demo runs on), it seems to render that stuff no problem. Maybe there is some old code from there that you guys can use?

@s4m0r4m4
Copy link
Author

Ping - just checking in on this

@Thomaash Thomaash added bug Something isn't working enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed labels Jul 28, 2020
@Thomaash
Copy link
Member

Hi @s4m0r4m4,

the DOT parser is old, definitely incomplete, not exactly specs compliant even where implemented and yeah, I can't rule out that it is indeed broken.

Would you (@s4m0r4m4 or whoever is reading this) be willing to work on this? All the logic is neatly confined (not a single import/require statement) within https://github.com/visjs/vis-network/blob/master/lib/network/dotparser.js so not much insight into the library is needed (DOT string is the input and the output is directly supplied to network.setData({ nodes, edges })).

@s4m0r4m4
Copy link
Author

Hi Thomassh, thanks for the message. I will have to see when this rises to the priority queue for me, it does sound nicely compartmentalized. Are you open to replacing most of that functionality with a separate lib (e.g. https://github.com/anvaka/dotparser/ or something similar)?

@Thomaash
Copy link
Member

Hi @s4m0r4m4,

I'm a bit concerned about the maintenance status of Dotparser. The last commit is from March when the author just noticed it's 2020 and decided to update the year in the license, the last commit that actually changed something important is from January 2019 (I don't know, maybe it was perfect back then and the format didn't change at all?). Bigger issue is that it's based on PEG.js which is dead (though, in case of issues, we could fork it and maintain it as yet another Vis project as there doesn't seem to be a decent alternative).

I'm not opposed to using a parser or even a dedicated DOT library for this (if there's a good one, it would be ideal actually), however it has to be done in treeshakable way as many people don't use DOT at all and this would be just bloat to them. Luckily, we just need a DOT string on input and an object, that can be passed to network.setData(…), on output. A function can do that and an exported function can easily be treeshaken if unused. It would be a breaking change, but if the parsing is going to be noticeably better, I'm all for it.

PS: The treeshakeability is mostly a note to myself, if you keep it contained within dotparser.ts and export a function from it, I'll rework the rest.

PPS: How I imagine the API.

import { Network, parseDOTNetwork } from "vis-network/standalone";
// The function parseDOTNetwork is available to be used.
new Network({}, parseDOTNetwork(myDOT));
import { Network } from "vis-network/standalone";
// The function parseDOTNetwork was not imported so Rollup/Webpack/whatnot can
// just throw it away with all of it's dependencies as if none of it ever even
// existed.
new Network(
  {},
  {
    nodes: [
      /* … */
    ],
    edges: [
      /* … */
    ],
  }
);

The following will be removed (it's already deprecated).

import { Network } from "vis-network/standalone";
// The function parseDOTNetwork is used internaly and can't be treeshaken.
new Network({}, { dot: myDOT });

@HighVoltageCoder
Copy link

HighVoltageCoder commented Jun 6, 2022

Hi,
I've also noticed problem with double quotes parsing. Example:
27 [IconPath="\"C:\Program Files (x86)\Microsoft Office\Root\Office16\SDXHelper.exe\" -Embedding"].
I looked into the source code and it looks like it does not handle those cases when there are double quotes. Parser takes the first (") and thinks that the second one is the end of string and the rest of the string shouldn't be there.
Do you know any workaround ? :)

@Huyston
Copy link

Huyston commented Oct 31, 2023

It also doesn't like comments. If I remove comments (//) it works.

@vorburger
Copy link

I've run into this as well while trying to render Graphviz from (my) Enola.dev RDF Graph Generation with Vis.js. (In my specific case, after finding and reading this issue, I'm going to "skip" this Dotparser and just "directly" turn my Things data structure into what Vis.js "natively" needs.)

PS: If any maintainers are reading this, I would duplicate #1957 to this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants