[whatwg] Web Address and its escape

Anne van Kesteren wrote:
> On Tue, 08 Sep 2009 21:40:22 +0200, NARUSE, Yui <naruse at airemix.jp> wrote:
>> First is about 4.10.16.4 URL-encoded form data.
>> http://www.whatwg.org/specs/web-apps/current-work/#application/x-www-form-urlencoded-encoding-algorithm
>>
>>
>> In this algorithm at 6.2.1,
>> "SP, *, -, ., 0 .. 9, A .. Z, _, a .. z" is not escaped.
>> But many other specs which use application/x-www-form-urlencoded refers
> 
> Which other specifications?

Following specifications. (sorry some of them are earlier RFC)

XForms 1.0
  http://www.w3.org/TR/xforms/#serialize-urlencode
  "then non-ASCII and reserved characters (as defined by [RFC 2396] as
  amended by subsequent documents in the IETF track) are escaped"
  -> so RFC3986

HTML 4
  http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.1
  "reserved characters are escaped as described in [RFC1738]"
  RFC1738 http://www.faqs.org/rfcs/rfc1738.html
    unreserved     = alpha | digit | safe | extra
    safe           = "$" | "-" | "_" | "." | "+"
    extra          = "!" | "*" | "'" | "(" | ")" | ","

TAG Finding
  "refer to section 2.1 of [RFC2396]."
  http://www.w3.org/2001/tag/doc/whenToUseGet.html#i18n
  RFC2396 http://www.faqs.org/rfcs/rfc2396.html
  unreserved  = alphanum | mark
  mark        = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"

WSDL 2.0
  http://www.w3.org/TR/wsdl20-bindings/#_http_x-www-form-urlencoded
  "Replacement values falling outside the range (ALPHA and DIGIT below are defined
  as per [IETF RFC 4234]): ALPHA | DIGIT | "-" | "." | "_" | "~" | "!" |
  "$" | "&" | "'" | "(" | ")" | "*" | "+" | "," | ";" | "=" | ":" | "@",
  MUST be percent-encoded."

>> URI's unreserved. And it in RFC3986 is
>>    unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
>> Why ~ is escaped and * is not escaped?
> 
> What do browsers do?

IE8
QUERY_STRING: t=+%21%5C%22%5C%23%24%25%26%27%28%29*%2B%2C-.%2F0123456789%3A%3B%3C%3D%3E%3F at ABCDEFGHIJKLMNOPQRSTUVWXYZ%5B%5C%5C%5D%5E_%60abcdefghijklmnopqrstuvwxyz%7B%7C%7D%7E
not escaped: *-. at _

Firefox 3.5
QUERY_STRING: t=+%21%5C%22%5C%23%24%25%26%27%28%29*%2B%2C-.%2F0123456789%3A%3B%3C%3D%3E%3F%40ABCDEFGHIJKLMNOPQRSTUVWXYZ%5B%5C%5C%5D%5E_%60abcdefghijklmnopqrstuvwxyz%7B%7C%7D%7E
not escaped: *-._

Chrome2
QUERY_STRING: t=+%21%5C%22%5C%23%24%25%26%27%28%29*%2B%2C-.%2F0123456789%3A%3B%3C%3D%3E%3F%40ABCDEFGHIJKLMNOPQRSTUVWXYZ%5B%5C%5C%5D%5E_%60abcdefghijklmnopqrstuvwxyz%7B%7C%7D%7E
not escaped: *-._

Opera9
QUERY_STRING: t=+%21%5C%22%5C%23%24%25%26%27%28%29%2A%2B%2C-.%2F0123456789%3A%3B%3C%3D%3E%3F%40ABCDEFGHIJKLMNOPQRSTUVWXYZ%5B%5C%5C%5D%5E_%60abcdefghijklmnopqrstuvwxyz%7B%7C%7D%7E
not escaped: -._

Hmm, Firefox and Chrome follow this, IE adds @, Opera removes *.
If this spec use safer side, * may be also escaped.

>> Third is about Web addresses in HTML 5. (this spec is also this ML?)
>> http://www.w3.org/html/wg/href/draft
> 
> You want public-iri at w3.org or public-html at w3.org for that draft.

Thanks, I'll send it.

-- 
NARUSE, Yui  <naruse at airemix.jp>

Received on Wednesday, 9 September 2009 07:33:39 UTC