Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cleanup action: "Make LaTeX ready: Escape $" #8673

Closed
ThiloteE opened this issue Apr 13, 2022 · 18 comments · Fixed by #8698
Closed

Add cleanup action: "Make LaTeX ready: Escape $" #8673

ThiloteE opened this issue Apr 13, 2022 · 18 comments · Fixed by #8698
Labels
bib(la)tex cleanup-ops good first issue An issue intended for project-newcomers. Varies in difficulty. unicode unicode related issues

Comments

@ThiloteE
Copy link
Member

Problem:

  1. Want to prepare entries to be used with LaTeX
  2. Want to not start math-mode, but instead simply render $ character in my bibliography

Describe the solution you'd like

  • Add cleanup action that adds a backslash in front of $
  • Name this cleanup action: Make LaTeX ready: Escape $
  • Add to the documentation: Adds backslash to $ characters. \$ will be rendered by LaTeX as $ instead of starting math-mode. Do not use this cleanup action, if you have entries that require usage of LaTeX math-mode.

Additional context

  • Why not add this function to Make LaTeX ready: Cleanup or Make LaTeX ready: Unicode to LaTeX? Both allow the usage of math-mode, which is fine. This seems to be a zero-sum game and there is no heuristic that can find out if the user wants to use math-mode or simply render the $.

Related to #8490 (comment) and #8650

@ThiloteE ThiloteE added bib(la)tex unicode unicode related issues cleanup-ops labels Apr 13, 2022
@Siedlerchr Siedlerchr added the good first issue An issue intended for project-newcomers. Varies in difficulty. label Apr 15, 2022
@fly-ing-fish
Copy link
Contributor

Hello! I am a junior undergraduate cs student and I wanna have a try. Hope you can give me the chance, thank you!

@ThiloteE
Copy link
Member Author

Hey @fly-ing-fish of course :-) Feel free to work on this.

As a general advice: check out https://github.com/JabRef/jabref/blob/main/CONTRIBUTING.md for a start. Also, https://devdocs.jabref.org/getting-into-the-code/guidelines-for-setting-up-a-local-workspace is worth having a look at. Don't mind asking, if you have any questions here on GitHub or also at JabRef's Gitter chat.

Try to open a (draft) pull request early on, so that people can see you are working on the issue and so that they can see the direction the pull request is heading towards. This way, you will likely receive valuable feedback.

@ThiloteE
Copy link
Member Author

ThiloteE commented Apr 16, 2022

I just did a test to check how LaTeX/Biber behaves concretely:

Test bibliography:

@Article{A20220416mwm,
  author  = {A},
  date    = {2022-04-16},
  title   = {Math with Mathmode: $2*3/3$},
  comment = {- No error during Biber compilation.
- Dollar Signs vanish. Only 2*3/3 remains.
- Rendering of 2*3/3 is different as compared to when not using mathmode.},
}

@Article{B20220416mwm,
  author  = {B},
  date    = {2022-04-16},
  title   = {Math without Mathmode: 2*3/3},
  comment = {- No error during Biber compilation.
- Dollar Signs vanish. Only 2*3/3 remains},
}

@Article{C20220416tia,
  author  = {C},
  date    = {2022-04-16},
  title   = {This is a normal entry with a backslashed \$ (dollar) sign. Think of being an economist and having to write about currencies},
  comment = {- biber finishes the compilation normally},
}

@Article{D20220416tia,
  author  = {D},
  date    = {2022-04-16},
  title   = {This is a normal entry with a $ (dollar) sign. Think of being an economist and having to write about currencies},
  comment = {- biber finishes the compilation but with errors.
- biber complains about a missing dollar sign
- the dollar sign STARTS mathmode, but since there is a dollar sign missing to CLOSE mathmode, there is an error},
}

@Comment{jabref-meta: databaseType:biblatex;}

The .tex file used for the test:

\documentclass[12pt,a4paper]{article}

	\usepackage[noibid,eprint=false,authordate,backend=biber]{biblatex-chicago}
  
%\usepackage{hyperref}


\usepackage[ngerman,english, bidi=basic]{babel}
	\usepackage{helvet}
			\renewcommand{\familydefault}{\sfdefault}
			\renewcommand{\familydefault}{\ttdefault}
	\usepackage[autostyle,german=quotes]{csquotes}


  
  
  
  
\addbibresource{bibliography.bib} 

\begin{document}
	
Start of the document.\\

\parencite{A20220416mwm,B20220416mwm,C20220416tia,D20220416tia}.\\

End of the document.\\


\printbibliography
	
\end{document}

Results:

image

@ThiloteE
Copy link
Member Author

The solution may resemble the cleanup actions Escape underscores or escape ampersands! Maybe part of the code can be copied from there :-)

@ThiloteE
Copy link
Member Author

Check out page 15 and 16 of the comprehensive LaTeX symbols list! (https://www.ctan.org/tex-archive/info/symbols/comprehensive)

image

image

I am not sure if all these other symbols in the above tables create problems when compiling with LaTeX nowadays (I would assume yes 😑). I am also not sure how many of these symbols the Unicode to LaTeX or the LaTeX cleanup actions are able to convert. Testing this would be nice. I can help with that.

Since you are touching this code anyway, implementing a few more cleanup-actions (or merging a few of them together into ONE cleanup action) might be possible, no? If you can find out how to do one, the others will be only a few clicks and "copy paste" away, I assume 😁

@fly-ing-fish
Copy link
Contributor

Thank you for the reply and support with details! I am working on it.

@fly-ing-fish
Copy link
Contributor

I followed the indications of the " Set up a local workspace" in the JabRef Development Documentation, but met a problem while doing "Build and run JabRef by double-clicking JabRef | Tasks | application | run.".

Task :generateBstGrammarSource FAILED
error(1): cannot write file : java.io.FileNotFoundException: src-gen\main\java\org\jabref\logic\bst\G:\BstParser.java (�ļ�����Ŀ¼������������ȷ��)
java.base/java.io.FileOutputStream.open0(Native Method)
java.base/java.io.FileOutputStream.open(FileOutputStream.java:293)
java.base/java.io.FileOutputStream.(FileOutputStream.java:235)
java.base/java.io.FileOutputStream.(FileOutputStream.java:184)
java.base/java.io.FileWriter.(FileWriter.java:96)
org.antlr.Tool.getOutputFile(Tool.java:889)
org.antlr.codegen.CodeGenerator.write(CodeGenerator.java:1292)
org.antlr.codegen.Target.genRecognizerFile(Target.java:98)
org.antlr.codegen.CodeGenerator.genRecognizer(CodeGenerator.java:459)
org.antlr.Tool.generateRecognizer(Tool.java:674)
org.antlr.Tool.process(Tool.java:487)
org.antlr.Tool.main(Tool.java:98)
error(1): cannot write file : java.io.FileNotFoundException: src-gen\main\java\org\jabref\logic\bst\G:\Bst__.g (�ļ�����Ŀ¼������������ȷ��)
java.base/java.io.FileOutputStream.open0(Native Method)
java.base/java.io.FileOutputStream.open(FileOutputStream.java:293)
java.base/java.io.FileOutputStream.(FileOutputStream.java:235)
java.base/java.io.FileOutputStream.(FileOutputStream.java:184)
java.base/java.io.FileWriter.(FileWriter.java:96)
org.antlr.Tool.getOutputFile(Tool.java:889)
org.antlr.Tool.process(Tool.java:513)
org.antlr.Tool.main(Tool.java:98)

Execution failed for task ':generateBstGrammarSource'.

Process 'command 'C:\Program Files\Java\jdk-17.0.2\bin\java.exe'' finished with non-zero exit value 1

wrong path are generated automatically, like src-gen\main\java\org\jabref\logic\bst\G:\BstParser.java and src-gen\main\java\org\jabref\logic\bst\G:\Bst__.g.

I have no idea to solve this right now, could you give me some advice, please?

@Siedlerchr
Copy link
Member

@fly-ing-fish is Jabref located directly under G:\Jabref?
Try move it down to a folder like G:\workspace\jabref

@fly-ing-fish
Copy link
Contributor

It works, thanks for your help! 😁 @Siedlerchr

@ThiloteE
Copy link
Member Author

Hey, I just noticed: Obviously, JabRef can't know if the user wants to use special characters with or without backslash, but JabRef could NOTIFY the user that there are special characters present in an entry via "Integrity check". (JabRef's current integrity check does not check for this)

When I tried to use a workaround and use "search" to search for a $ sign within the entry, it was not found. 😅 Yay. Another issue that lets me look forward to JabRef 6.x with Lucene search xD

@Siedlerchr
Copy link
Member

@fly-ing-fish I would appreciate it if you could add a hint to the devdocs https://github.com/JabRef/jabref/blob/main/docs/getting-into-the-code/guidelines-for-setting-up-a-local-workspace.md

@fly-ing-fish
Copy link
Contributor

fly-ing-fish commented Apr 18, 2022

  • I agree that there is no heuristic that can find out if the user wants to use math-mode or simply render the $. 😳 May be it's necesssary for users to remember to add a ' ' before ' $ ' if they really want to show a '$' while using math mode? As you said, Integrity check is a good idea.

image

  • As for other symbols in this picutre, acutually, I don't find the similar issue like '$' has. It seems that cleanup works correctly while handling them.

  • Accoding to the description, I will add a class named EscapeCurrencySymbol in org.jabref.logic.formatter.bibtexfields, just like EscapeUnderscoresFormatter in the same package and update Formatters.getOthers() in org.jabref.logic.formatter to add EscapeCurrencySymbol() into the list. The logic should be easy, just need a regular expression to replace all "$" to "\$". Corresponding instructions will be added to the document.

  • I will add a little hint into the doc to remind people not to put JabRef folder directly in the disk root directory
    https://github.com/JabRef/jabref/blob/main/docs/getting-into-the-code/guidelines-for-setting-up-a-local-workspace.md

  • If my thought is right, I am going to write code and try PR. Please indicate me if there are mistakes or more things I need to do. 😃

@ThiloteE
Copy link
Member Author

I finished my tests. Here my test library: test-8673-encoding of special characters.bib.txt

I wrote some comments into this file.

Most interesting results:

  • entries with citationkeys test1ca and test1cb show that %, $, _, # and & all do not get backslashed when using the Unicode to LaTeX cleanup action. We know the $ is because of math-mode. I wonder why the others are not converted...
  • ... does not get converted from Unicode to LaTeX.
  • _ seems to be a special case. \_ does also not get converted back via LaTeX to Unicode cleanup action.

Will try to compile the rest of the symbols with Latex one of these days and see what happens

@ThiloteE
Copy link
Member Author

Most of the characters are being converted nicely, so that's fine :))
@fly-ing-fish Thank you!

@ThiloteE
Copy link
Member Author

Btw. during these tests, I encountered #8687, so testing this took more time that I intended to spend on this :/

@fly-ing-fish
Copy link
Contributor

fly-ing-fish commented Apr 19, 2022

As for the problem of %, _, #, & mentioned above, seems the problem is in Java. Seems that something goes wrong wihle generatling UNICODE_LATEX_CONVERSION_MAP entries from entries like {"35", "num", "\\#"} in CONVERSION_LIST. Adding the corresponding mapping into UNICODE_LATEX_CONVERSION_MAP manually like UNICODE_LATEX_CONVERSION_MAP.put(String.valueOf(Character.toChars(Integer.decode("35"))), "\\#") can be a rude solution. But if do so, cleanup will turns \\# to \\\\# and \\\\# to \\\\\\#... if users keep doning cleanup. So another check sould be added to make sure it works correctly if users cleanup twice or more. I believe this is not the best solution.

@ThiloteE
Copy link
Member Author

For %, $, _, # and & see also their uses in LaTeX, but this webpage is from 1995, so I probably still should do a compiling test xD

  • I know for a fact that % definitely opens the "comment" environment. So if that one does not get backslashed, it will omit everything that comes after from being compiled. Ah yes and we have another cleanup action that will backslash it: latex-cleanup. I think backslashing % should be removed from LaTeX-cleanup and put into a standalone cleanup action or merged with a cleanup action that puts backslash to special LaTeX characters.

@Siedlerchr do you know where the Unicode to LaTeX mapping comes from? Is is JabRef self made, or is it a dependency? Would we ever want to touch the Unicode to LaTeX converter?

@Siedlerchr
Copy link
Member

@fly-ing-fish No need to adapt the latex <> unicode conversion map. See the existing LatexCleanupFormatter with the regex pattern for ESCAPE_PERCENT_SIGN_ONCE that also works when the value is already escaped.

@ThiloteE For unicode <-> latex it's kind of both. We use https://github.com/plurimath/unicode2latex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bib(la)tex cleanup-ops good first issue An issue intended for project-newcomers. Varies in difficulty. unicode unicode related issues
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants