- From: <bugzilla@wiggum.w3.org>
- Date: Sun, 28 Jun 2009 08:33:39 +0000
- To: public-html-bugzilla@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=6742 Ian 'Hixie' Hickson <ian@hixie.ch> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution| |INVALID --- Comment #13 from Ian 'Hixie' Hickson <ian@hixie.ch> 2009-06-28 08:33:39 --- > A communication as it should work follows: > -- Sending human types: I said "this & that." > -- Sending UA encodes to: I said %22this %26 that.%22 > -- Receiving UA will decode from: I said %22this %26 that.%22 > -- Receiving human reads: I said "this & that." If by "Receiving UA" you mean the server, then this is correct. > But suppose the sender, who we'll say is a programmer, types a percent-code > into the original message, while typing quote marks as usual: > -- Sending human types: You inserted %26 on line 7. Like Pat said, "you > shouldn't have." > -- Sending UA encodes to: You inserted %26 on line 7. Like Pat said, %22you > shouldn't have.%22 This is incorrect. When sending the text, the "%" entered by the user must be encoded as %25, so the sending UA encodes to: You inserted %2526 on line 7. Like Pat said, %22you shouldn't have.%22 > This responds to sec. 4.10.16.4, step 6, substep 2, subsubstep 1, and sec. 8.2 > of <http://www.w3.org/TR/html5/single-page/>, Working Draft 23 April 2009, as > accessed 6-28-09. Sec. 8.2.4 appears relevant except that I couldn't find a > subsection thereof that specifically governed percent-decoding, or I missed it; > perhaps something should be added on the assumption that UA makers infer its > existence anyway. Specifically, in section 4.10.16.4 URL-encoded form data, notice that in step 6.2.1 the "%" character is not included in the list of characters that is not encoded -- that means it must itself be encoded: "If the character isn't in the range U+0020, U+002A, U+002D, U+002E, U+0030 .. U+0039, U+0041 .. U+005A, U+005F, U+0061 .. U+007A then replace the character with a string formed as follows: Start with the empty string, and then, taking each byte of the character when expressed in the selected character encoding in turn, append to the string a U+0025 PERCENT SIGN character (%) followed by two characters in the ranges U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9) and U+0041 LATIN CAPITAL LETTER A to U+005A LATIN CAPITAL LETTER Z representing the hexadecimal value of the byte (zero-padded if necessary)." > UTF-8 is recommended but not mandatory, thus a UA not using UTF-8 might not be > a violation. The particular case you are mentioning is unaffected by the encoding used, the escaping of "%" happens in all encodings. -- Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
Received on Sunday, 28 June 2009 08:33:51 UTC