Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support decoding and encoding of LaTeX characters #161

Closed
koppor opened this issue Sep 11, 2015 · 13 comments
Closed

Support decoding and encoding of LaTeX characters #161

koppor opened this issue Sep 11, 2015 · 13 comments

Comments

@koppor
Copy link
Member

koppor commented Sep 11, 2015

There is latex2utf8, which source is https://github.com/fc7/LaTeX-Decode. The opposite is LaTeX::Encode.

We should think of including this in JabRef. Possible in the cleanup functionality.

See also #160.

@koppor
Copy link
Member Author

koppor commented Sep 12, 2015

This is related with sf bug #721. It seems that this functionality is done during import and not at "Cleanup entries". The existing functionality should be checked with the one of latex2utf8. Then, one of it should be chosen and integrated at "Cleanup entries".

@oscargus
Copy link
Contributor

I've browsed the code and (having written most of the current JabRef converter, in HTMLConverter), I'd say that the current one supports more characters (although there may be some missing which are worthwhile adding). Also, the current implementation supports converting from HTML. I would assume that it should be possible to use the same table to do the reverse conversion.

@oscargus
Copy link
Contributor

I started merging the missing characters that were present in latex2utf8 and will provide a PR in a few days.

@koppor koppor changed the title Support encoding and ecoding of LaTeX characters Support encoding and encoding of LaTeX characters Sep 24, 2015
@oscargus
Copy link
Contributor

oscargus commented Oct 4, 2015

There's a huge list at http://www.w3.org/Math/characters/unicode.xml

@koppor koppor changed the title Support encoding and encoding of LaTeX characters Support decoding and encoding of LaTeX characters Dec 5, 2015
@koppor
Copy link
Member Author

koppor commented Dec 5, 2015

The Unicode converter converts from Unicode to LaTeX and not vice versa, does it? At my first try, it did not treat the Author field, but at a second try it did. Need to investigate what could have been gone wrong.

@oscargus
Copy link
Contributor

oscargus commented Dec 5, 2015

Correct. It should be possible to do it the other way around as well similar to the export formatters XMLChars, RTFChars, and HTMLChars.

Especially, one would like to use the huge list in HTMLConverter for HTMLChars (and maybe XMLChars) as well. I think one major issue here is how to deal with {\"{a}} vs \"{a} vs \"a vs {\"a}, but looking at e.g. HtmlCharsMap, it seems like there is a solution for that in HTMLChars, so probably only a matter of converting the LaTeX commands in HTMLConverter to the same format as in HtmlCharsMap.

This is something that I have been thinking about, but so far not succeeded to find the time/motivation to do.

@oscargus
Copy link
Contributor

oscargus commented Dec 5, 2015

There is also a class FormatChars that does Latex to Unicode, which could be extended to cover everything in HTMLConverter.

@oscargus
Copy link
Contributor

With #841 there's a huge step towards having quite good conversion in both directions.

@koppor
Copy link
Member Author

koppor commented Feb 21, 2016

Refs #160

@koppor
Copy link
Member Author

koppor commented Mar 25, 2016

Refs #1013.

@tobiasdiez
Copy link
Member

What is the status of this issue? It seems like both conversation direction are present as cleanup operations.

@oscargus
Copy link
Contributor

oscargus commented Apr 9, 2016

Agreed. Of course, it can always be improved, but I believe it is one of the better conversions (apart from LaTeX).

@lenhard
Copy link
Member

lenhard commented Apr 13, 2016

This issue can be closed thanks to the cleanup operations

@lenhard lenhard closed this as completed Apr 13, 2016
koppor pushed a commit that referenced this issue Apr 1, 2023
a7c6f63e25 correct license to match the SPDX license identifier. (#281)
d704bf80af Update locales-nl-NL.xml (#229)
5ffb73b05a Bump nokogiri from 1.13.9 to 1.13.10 (#280)
04be62eda6 Update locales-pt-BR.xml (#251)
b4db583787 Update locales-pt-BR.xml (#265)
b656b1b6f9 Fix date format for Basque (#274)
e7ec9bff94 Bump nokogiri from 1.13.4 to 1.13.9 (#272)
9125705f62 Update locales-nl-NL.xml (#279)
87445b0b65 Add composer.json (#161)
2919a84bff Fix page label in NO locales

git-subtree-dir: buildres/csl/csl-locales
git-subtree-split: a7c6f63e25323ac2f375943417d7f778f875f11c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants