-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTML encoded characters in markup-tag #585
Comments
This happens because Prism uses pattern: /^<\/?[\w:-]+/i, to pattern: /^<\/?[\w\u00e4:-]+/i, and Prism will correctly highlight the above snippet. I am not sure how a more general fix is reasonable though. Would something like |
HTML and XML don't share the same grammar, as XML allows additional characters while HTML only allows alphanumeric ASCII characters as tag names. We could either create a separate XML language, that relaxes some of the rules of HTML or in the spirit of ("highlighter, not a linter") allow some invalid HTML. |
Thank you for your comments. If I change line 10 as suggested by @uranusjr it doesn't change anything. If I add the characters to line 7, the syntax gets highlighted but the highlighter thinks that after the Umlaut the attr-name starts (see the picture below). Can I address that? Line 7 now looks like this:
|
You could try something like this 'tag': {
// ↓-------↓ here
pattern: /<\/?[^\s>\/]+\s*(?:\s+[\w:-]+(?:=(?:("|')(\\?[\w\W])*?\1|[^\s'">=]+))?\s*)*\/?>/i,
inside: {
'tag': {
// ↓-------↓ and here
pattern: /^<\/?[^\s>\/]+/i,
inside: {
'punctuation': /^<\/?/,
'namespace': /^[\w-]+?:/
}
},
// ... Which translates to "a tag name is anything that is not a whitespace character, a closing bracket |
@apfelbox is right. We should relax the markup grammar a bit, as long as it doesn’t result in incorrect highlighting of HTML examples. |
So, I used @apfelbox' solution and came up with this (masked < with Prism.languages.markup = {
'comment': /<!--[\w\W]*?-->/g,
'prolog':/<\?.+?\?>/,
'doctype': /<!DOCTYPE.+?>/,
'cdata': /<!\[CDATA\[[\w\W]*?]]>/i,
'tag': {
pattern: /<\/?[^\s>\/]+\s*(?:\s+[\w:-]+(?:=(?:("|')(\\?[\w\W])*?\1|[^\s'">=]+))?\s*)*\/?>/i,
inside: {
'tag': {
pattern: /^<\/?[^\s&>\/]+/i,
inside: {
'punctuation': /^<\/?/,
'namespace': /^[\w-]+?:/
}
},
'attr-value': {
pattern: /=(?:('|")[\w\W]*?(\1)|[^\s>]+)/i,
inside: {
'punctuation': /=|>|"/
}
},
'punctuation': /\/?>/,
'attr-name': {
pattern: /[\w:-]+/,
inside: {
'namespace': /^[\w-]+?:/
}
}
}
},
'entity': /&#?[\da-z]{1,8};/i
}; |
I have some xml-tags with html encoded characters:
Prism doesn't recognize the second tags (Läufer) correctly and highlight them. Is there a way to add all german umlauts to the markup-plugin?
The text was updated successfully, but these errors were encountered: