[Bug 6742] pre-encoded form values should be restorable as submitted

http://www.w3.org/Bugs/Public/show_bug.cgi?id=6742


Ian 'Hixie' Hickson <ian@hixie.ch> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |INVALID




--- Comment #13 from Ian 'Hixie' Hickson <ian@hixie.ch>  2009-06-28 08:33:39 ---
> A communication as it should work follows:
> -- Sending human types: I said "this & that."
> -- Sending UA encodes to: I said %22this %26 that.%22
> -- Receiving UA will decode from: I said %22this %26 that.%22
> -- Receiving human reads: I said "this & that."

If by "Receiving UA" you mean the server, then this is correct.


> But suppose the sender, who we'll say is a programmer, types a percent-code
> into the original message, while typing quote marks as usual:
> -- Sending human types: You inserted %26 on line 7. Like Pat said, "you
> shouldn't have."
> -- Sending UA encodes to: You inserted %26 on line 7. Like Pat said, %22you
> shouldn't have.%22

This is incorrect. When sending the text, the "%" entered by the user must be
encoded as %25, so the sending UA encodes to: You inserted %2526 on line 7.
Like Pat said, %22you shouldn't have.%22


> This responds to sec. 4.10.16.4, step 6, substep 2, subsubstep 1, and sec. 8.2
> of <http://www.w3.org/TR/html5/single-page/>, Working Draft 23 April 2009, as
> accessed 6-28-09. Sec. 8.2.4 appears relevant except that I couldn't find a
> subsection thereof that specifically governed percent-decoding, or I missed it;
> perhaps something should be added on the assumption that UA makers infer its
> existence anyway.

Specifically, in section 4.10.16.4 URL-encoded form data, notice that in step
6.2.1 the "%" character is not included in the list of characters that is not
encoded -- that means it must itself be encoded:

"If the character isn't in the range U+0020, U+002A, U+002D, U+002E, U+0030 ..
U+0039, U+0041 .. U+005A, U+005F, U+0061 .. U+007A then replace the character
with a string formed as follows: Start with the empty string, and then, taking
each byte of the character when expressed in the selected character encoding in
turn, append to the string a U+0025 PERCENT SIGN character (%) followed by two
characters in the ranges U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9) and
U+0041 LATIN CAPITAL LETTER A to U+005A LATIN CAPITAL LETTER Z representing the
hexadecimal value of the byte (zero-padded if necessary)."


> UTF-8 is recommended but not mandatory, thus a UA not using UTF-8 might not be
> a violation.

The particular case you are mentioning is unaffected by the encoding used, the
escaping of "%" happens in all encodings.


-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Sunday, 28 June 2009 08:33:51 UTC