- From: Erik van der Poel <erikv@google.com>
- Date: Fri, 25 Sep 2009 11:23:14 -0700
- To: Larry Masinter <masinter@adobe.com>
- Cc: "PUBLIC-IRI@W3.ORG" <PUBLIC-IRI@w3.org>, Martin Dürst <duerst@it.aoyama.ac.jp>
On Fri, Sep 25, 2009 at 10:42 AM, Larry Masinter <masinter@adobe.com> wrote: > Some mail didn't get sent to public-IRI which should have been: > > On 2009/09/03 7:33, Larry Masinter wrote: >> Sorry to be rehashing what I think are old topics, but the discussion of these things seems to be scattered around on a zillion mailing lists: >> >> >> * I'm not sure why http://example.com/%<http://example.com/%25> should be illegal as an IRI. I remember some discussion of this, but not the resolution. Why not update IRI to allow it, since it seems to work in most systems? I think this got garbled along the way, but I assume you're talking about a percent sign (%) in the path part that is not followed by two hex digits. This does not "work in most systems". Our automated tests show that IE8 will not send the HTTP request, Safari4 escapes % as %25, while Firefox, Chrome and Opera leave the % as is. > Martin: > > It's illegal in URIs, too. The URI and IRI syntaxes should be as > parallel as possible. In terms of implementations, it may be easy for > consumers, but for producers, it's not. It's much easier to just escape > than to go and check whether (one or) two hex digits are following > (which would change the meaning totally). Surely that depends on the type of producer. For HTML form submissions, % should be escaped as %25, but for HTML hrefs, the producer is also a consumer and should first check whether two hex digits follow. The big question is what to do about a % sign that is not followed by two hex digits. The major browsers currently handle this differently, so producers would be wise to avoid this, but it is not clear to me what advice should be given to consumer/producer implementers. Is it better to be conservative like IE and reject it? Or is it better to be forgiving like Firefox and just send out the lone % sign? (Note: this particular case is interesting, because IE is usually the forgiving one, while Firefox is the conservative one.) > Larry: >> * I'm not sure why U+0023 NUMBER SIGN should be disallowed in the characters allowed in the<fragment> production. Again, same question... > > Martin: > > I seem to remember something to the effect that some implementations > parsed URIs from the back to chop off the fragment part. Ugh. Just say No to such implementations. Always parse forward. > Larry: >> * I don't understand how current processors handle [] square brackets. I'm reading > > Martin: > Where did you read that? > > Larry: >> If w begins with either of: >> a string matching the<scheme> production, followed by "://" >> the string "//" >> >> then percent-encode any left or right square brackets (U+005B, U+005D, "[" and "]") following the first occurrence of "/", "?", or "#" which follows the first occurrence of "//". >> >> >> What the heck is this about? > > Martin: > > I think the purpose is to %-encode '[' and ']' except for the authority > part, where they are needed for IPV6. The encoding is done because '[' > and ']' are not allowed elsewhere than in IP-literal. I don't see why [ and ] should be disallowed in the path and query parts, but the major browsers currently handle those characters differently in the path/query. (Some browsers %-encode, others don't.) Erik
Received on Friday, 25 September 2009 18:23:55 UTC