- From: Larry Masinter <masinter@adobe.com>
- Date: Sun, 3 Apr 2011 15:03:56 -0700
- To: Adam Barth <ietf@adambarth.com>, "julian.reschke@gmx.de" <julian.reschke@gmx.de>
- CC: Larry Masinter <masinter@adobe.com>, Noah Mendelsohn <nrm@arcanedomain.com>, Martin J. Dürst <duerst@it.aoyama.ac.jp>, Ted Hardie <ted.ietf@gmail.com>, Tony Hansen <tony@att.com>, "public-iri@w3.org" <public-iri@w3.org>
I should have been more precise, since I only meant for characters not otherwise allowed in URIs. This covers: a) for characters outside 7-bit ASCII range: scheme definitions MUST NOT distinguish between %-hex-encoded-UTF8 and unicode character b) for (ASCII) characters disallowed in URIs: ... MUST NOT distinguish ... For characters allowed in URIs: c) for (ASCII) unreserved characters allowed in URIs: ... SHOULD NOT distinguish ... d) for reserved characters not syntactically significant for the scheme: ... MAY distinguish ... e) for reserved characters when syntactically significant as reserved characters: ... MUST distinguish ... -----Original Message----- From: public-iri-request@w3.org [mailto:public-iri-request@w3.org] On Behalf Of Adam Barth Sent: Sunday, April 03, 2011 1:28 PM To: Julian Reschke Cc: Larry Masinter; Noah Mendelsohn; Martin J. Dürst; Ted Hardie; Tony Hansen; public-iri@w3.org Subject: Re: scheme-specific length limits (issue 48) On Sun, Apr 3, 2011 at 1:05 PM, Julian Reschke <julian.reschke@gmx.de> wrote: > On 03.04.2011 20:06, Adam Barth wrote: >> On Sun, Apr 3, 2011 at 5:48 AM, Larry Masinter<masinter@adobe.com> wrote: >>> A scheme registration defines the syntax for URIs (IRIs) that are valid >>> for the scheme. A syntax definition can include limits -- that some strings >>> are valid for the scheme and other strings are not. Those limits can be >>> complicated, limit the repertoire of characters, be expressed in BNF, and >>> can include length limits. >>> >>> Syntactic restrictions should be justified, usually by the limits of the >>> resolution mechanism or protocol associated with a string. And we should >>> disallow any limits (or any other syntactic restrictions) that treat %-hex >>> encoded UTF8 characters differently than their unicode character >>> equivalents. >> >> That doesn't seem correct. For example, the http scheme treats %-hex >> encoded UTF8 characters differently than their unicode character >> equivalents in some cases. Consider: >> >> http://example.com/foo?bar >> http://example.com/foo%3Fbar >> >>> document.body.innerHTML = "<a >>> href='http://example.com/foo%3Fbar'>boo</a>" >>> document.body.firstChild.pathname >> >> "/foo%3Fbar" >> >>> document.body.innerHTML = "<a href='http://example.com/foo?bar'>boo</a>" >>> document.body.firstChild.pathname >> >> "/foo" >> ... > > No news. "?" is special in URI parsing, thus it needs to be escaped when > it's not meant to start a query component. Yeah, I'm not saying that behavior is surprising. I'm saying that Larry's requirement is violated even for very commonly used schemes. Adam
Received on Sunday, 3 April 2011 22:05:24 UTC