[whatwg] Web Address and its escape from NARUSE, Yui on 2009-09-08 (public-whatwg-archive@w3.org from September 2009)

From: NARUSE, Yui <naruse@airemix.jp>
Date: Wed, 09 Sep 2009 04:40:22 +0900
Message-ID: <4AA6B326.6040304@airemix.jp>

Hi,
I have some comments and questions about urlencode and Web Address.


First is about 4.10.16.4 URL-encoded form data.
http://www.whatwg.org/specs/web-apps/current-work/#application/x-www-form-urlencoded-encoding-algorithm

In this algorithm at 6.2.1,
"SP, *, -, ., 0 .. 9, A .. Z, _, a .. z" is not escaped.
But many other specs which use application/x-www-form-urlencoded refers
URI's unreserved. And it in RFC3986 is
   unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
Why ~ is escaped and * is not escaped?


Second is also URL-encoded form data 6.2.1.
This says:
> the string a U+0025 PERCENT SIGN character (%) followed by two
> characters in the ranges U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE
> (9) and U+0041 LATIN CAPITAL LETTER A to U+005A LATIN CAPITAL LETTER Z
But hexadecimal is 0-9 A-F,
so to "U+0046 LATIN CAPITAL LETTER F" seems right.


Third is about Web addresses in HTML 5. (this spec is also this ML?)
http://www.w3.org/html/wg/href/draft

In 2 Parsing Web addresses at 2. Percent-encode all non-URI characters in w,
percent-encoding many characters includeing U+0025 percent sign.
But by this spec, if a Web address w is already escaped URL,
this process double-escape those characters.

For example, w is http://www.example.org/D%C3%BCrst,
on step 2, w comes to be http://www.example.org/D%25C3%25BCrst.
And on step 5, w is broken.

Regards.

-- 
NARUSE, Yui  <naruse at airemix.jp>

Received on Tuesday, 8 September 2009 12:40:22 UTC