Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Markdown: Awkward soft break after abbreviation between ( and newline #4635

Closed
fiapps opened this issue May 8, 2018 · 3 comments
Closed

Comments

@fiapps
Copy link

fiapps commented May 8, 2018

Test case:

echo '(cf.
Foo)' | pandoc -f markdown -t markdown

Output: ( cf. Foo).

A space has been added after the open parenthesis. More precisely, if the native output format is chosen, we see it's a SoftBreak: [Para [Str "(",SoftBreak,Str "cf.\160Foo)"]].

This is a sufficiently rare case that it only occurred once in a 350 page document.

@jgm jgm changed the title Markdown parse error when newline follows abbreviation inside parentheses Markdown: Awkward soft break after abbreviation between ( and newline May 8, 2018
@msprev
Copy link

msprev commented Jun 19, 2018

This is actually a pretty common bug if you used hard line wrapping in your source document. It produces the error any time a line in your source document ends in an abbreviation prefixed by parenthesis:

Lorem (e.g.
ipsum)

produces output

Lorem ( e.g. ipsum)

I hard wrap at 78 characters in my source documents. On average for me, this produces ~3 errors per 8,000 words and of course it affects all output formats.

@mb21
Copy link
Collaborator

mb21 commented Jun 19, 2018

A possible workaround is to use --abbreviations=/dev/null (or another empty file)

@jgm
Copy link
Owner

jgm commented Oct 14, 2018

Here's the relevant code (in str in the Markdown reader):

      abbrevs <- getOption readerAbbreviations
      if not (null result) && last result == '.' && result `Set.member` abbrevs
         then try (do ils <- whitespace <|> endline
                      lookAhead alphaNum
                      return $ do
                        ils' <- ils
                        if ils' == B.space
                           then return (B.str result <> B.str "\160")
                           else -- linebreak or softbreak
                                return (ils' <> B.str result <> B.str "\160"))
                <|> return (return (B.str result))
         else return (return (B.str result)))

The logic is this: when an abbreviation is followed by a space, we replace it by a nonbreaking space. When it is followed by a line break (soft or hard), we replace it by a nonbreaking space and move the line break before the abbreviation. That gives bad results when the abbreviation isn't itself preceded by a space.

@jgm jgm closed this as completed in cf82240 Oct 14, 2018
jgm added a commit that referenced this issue Oct 3, 2021
Note that with this new implementation, you can defeat the
abbreviationization by using two spaces after the period.

This commit removes support for moving abbreviations
after soft breaks (#4635), so the abbreviation support won't
work for abbreviations occuring at the end of a line.
It seems better not to mess with the user's soft breaks,
especially now that we have `--wrap=preserve`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants