- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Wed, 06 Sep 2006 03:47:10 +0200
- To: public-appformats@w3.org
Dear Web Application Formats Working Group, http://www.w3.org/TR/2006/WD-web-forms-2-20060821/ section 5.3 item 4 is: Control names and values are escaped. Space characters are replaced by "+" (U+002B), and other non-alphanumeric characters are encoded in the submission character encoding and each resulting byte is replaced by "%HH", a percent sign (U+0025) and two uppercase hexadecimal digits representing the value of the byte. This text is rather unclear and incorrect; it does not define what non- alphanumeric characters are (and whatever it means, it's incorrect), the character encoding is applied to the whole string, not just non-alpha- numeric characters, and %hh encoding is applied based on what the bytes are, not what the character were. Consider the following cases: * encoding is UTF-8 and the value is "_", implementations should not apply %hh encoding to it even though it's not alphanumeric * encoding is UTF-7 and the value is "ö", the byte sequence would be +APY- and implementations should apply %hh escaping only to the +, not to the whole thing or nothing (depending on whether "ö" is con- sidered alphanumeric) Please change the draft in a way that properly reflects the above and current implementations. I don't know the exact set of bytes that need to have %hh encoding applied, but I suspect the set is similar to that of characters considered reserved in the query string as per RFC 3986. regards, -- Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de 68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Received on Wednesday, 6 September 2006 01:54:07 UTC