-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Quoted-Printable encoding #292
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current quoted_printable
has some kind of bug where it strips some trailing(?) whitespaces and tabs. Do you think the new implementation fixes that (so Trim
calls could be removed)?
gen_smtp/test/prop_mimemail.erl
Line 121 in 1a4f389
?assertEqual(Trim(Body), Trim(mimemail:decode_quoted_printable(QPEncoded))), |
Yes, indeed 😄 The problem with the current encoder is that it does not fulfill the requirements of (3) of RFC 2045 Section 6.7 fully: while it does encode WSPs that are followed by CRLF, it does not encode them when they are the last character in the body. The new implementation handles that properly. |
Nice work! @seriyps do you want to address the Trim in that one test or shall we accept this as-is? |
@mworrell working on an update for the PR, almost done. I'll remove the Trim along with that, it's not needed any more, encoding and decoding will match up 😄 |
6264766
to
cde436f
Compare
Last commit fixes the line length bug I had in my first implementation, adds some tests for a few edge cases, augments another test to ensure that whitespaces are encoded when last character, and removes the trimming from the proptest. |
cde436f
to
73c27ca
Compare
One last micro-optimization before I leave this in your hands, in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks!
73c27ca
to
c84712e
Compare
I am very happy with this. @seriyps shall we merge? |
Yes, looks good! |
Merged, thank you @Maria-12648430 ! |
Impressive 😮 |
The current implementation of
encode_quoted_printable/3
is a bit crude (no offense, really 😜), especially when it comes to lines reaching the limit of 76 characters, resorting tostring
functions to retrieve the last line and finding whitespaces in there.The improved version
encode_quoted_printable/6
which I'm proposing here works by remembering the occurences of linear whitespaces (including HTABs), and otherwise only look-aheads in the original data to see if a CRLF follows the current character. I also refactored the encoding of characters itself to work via integer operations only, thereby getting rid of all theio:format
andlists:flatten
calls.I admit that my implementation isn't exactly easy on the eye, but that is because QP-encoding itself is tricky. And IMO, the current implementation isn't that easy to figure out, either 😉
I left the current tests untouched in what they test, only replaced the calls to the helper function
encode_quoted_printable/3
(of which I don't see the point) with calls to the API functionencode_quoted_printable/1
. I also removed one test,"newline craziness"
, of which I also fail to see the point 🤔Finally, I put in a micro-optimization in
choose_transformation/1
which avoids division and rounding and only uses multiplication to figure out if the percentage of printable characters in the sample is above 80.[EDIT]: Just noticed an edge case in which a line could exceed the length limit 😆 Will fix and update the PR tomorrow.