- From: NARUSE, Yui <naruse@airemix.jp>
- Date: Wed, 09 Sep 2009 04:40:22 +0900
Hi, I have some comments and questions about urlencode and Web Address. First is about 4.10.16.4 URL-encoded form data. http://www.whatwg.org/specs/web-apps/current-work/#application/x-www-form-urlencoded-encoding-algorithm In this algorithm at 6.2.1, "SP, *, -, ., 0 .. 9, A .. Z, _, a .. z" is not escaped. But many other specs which use application/x-www-form-urlencoded refers URI's unreserved. And it in RFC3986 is unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" Why ~ is escaped and * is not escaped? Second is also URL-encoded form data 6.2.1. This says: > the string a U+0025 PERCENT SIGN character (%) followed by two > characters in the ranges U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE > (9) and U+0041 LATIN CAPITAL LETTER A to U+005A LATIN CAPITAL LETTER Z But hexadecimal is 0-9 A-F, so to "U+0046 LATIN CAPITAL LETTER F" seems right. Third is about Web addresses in HTML 5. (this spec is also this ML?) http://www.w3.org/html/wg/href/draft In 2 Parsing Web addresses at 2. Percent-encode all non-URI characters in w, percent-encoding many characters includeing U+0025 percent sign. But by this spec, if a Web address w is already escaped URL, this process double-escape those characters. For example, w is http://www.example.org/D%C3%BCrst, on step 2, w comes to be http://www.example.org/D%25C3%25BCrst. And on step 5, w is broken. Regards. -- NARUSE, Yui <naruse at airemix.jp>
Received on Tuesday, 8 September 2009 12:40:22 UTC