Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the form data variant of percent_encode/decode #2930

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

divanburger
Copy link
Contributor

There is a variant of percent encoding used in the body of HTTP requests that differs from the normal percent encoding. Instead of converting spaces to %20 they are instead converted to a +.
https://en.wikipedia.org/wiki/Percent-encoding#The_application/x-www-form-urlencoded_type

This issue pops up when trying to decode values from a web form as +'s are not decoded to a space.

The form_data_decode can actually decode both variants as a + is always encoded to it's percent encoded version %2B so a + must always be a space. Regardless, I made a separate decode as the form_data variant is more complicated to decode.

A index_byte_any method was added to make decoding easier.

@Kelimion
Copy link
Member

Kelimion commented Nov 9, 2023

I'm not sure if we can properly call this // For data with content type application/x-www-form-urlencoded if it doesn't take in k,v pairs that are then separated by either & or ; (; is a holdover from HTML 2).

The WhatWG spec also talks about tuples.

Further, as per the spec, a literal % is allowed: Otherwise, if byte is 0x25 (%) and the next two bytes after byte in input are not in the ranges 0x30 (0) to 0x39 (9), 0x41 (A) to 0x46 (F), and 0x61 (a) to 0x66 (f), all inclusive, append byte to output.

Your decoder fails to decode c=(m^e)%n to c=(m^e)%n for example because it returns early if no two hex digits follow.

As for + instead of %20, the WhatWG spec says this about urlencode (as called the k,v serializer):

For each byte of encodeOutput converted to a byte sequence:

If spaceAsPlus is true and byte is 0x20 (SP), then append U+002B (+) to output and [continue](https://infra.spec.whatwg.org/#iteration-continue).

The more appropriate action then may be to update percent_encode and percent_decode (and rename them urlencode_*) to take a param , space_as_plus := false.

@Kelimion
Copy link
Member

Kelimion commented Nov 9, 2023

The thing to do, I think, is to have urlencode and urldecode with the optional space_as_plus param.
Then have a form_parse and form_serialize or something which take in the map[string]string, piece the key and value out to urlencode separately and then join it with a =, as per the spec.

join_url and split_url can then also use this mechanism for query string.

@Kelimion Kelimion marked this pull request as draft November 24, 2023 13:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants