W3C home > Mailing lists > Public > public-iri@w3.org > April 2011

Re: scheme-specific length limits (issue 48)

From: Adam Barth <ietf@adambarth.com>
Date: Sun, 3 Apr 2011 13:27:43 -0700
Message-ID: <AANLkTi=-aE=9iQWMpQMB9_-7XRR1j47rh53aPoODKssw@mail.gmail.com>
To: Julian Reschke <julian.reschke@gmx.de>
Cc: Larry Masinter <masinter@adobe.com>, Noah Mendelsohn <nrm@arcanedomain.com>, Martin J. Dürst <duerst@it.aoyama.ac.jp>, Ted Hardie <ted.ietf@gmail.com>, Tony Hansen <tony@att.com>, "public-iri@w3.org" <public-iri@w3.org>
On Sun, Apr 3, 2011 at 1:05 PM, Julian Reschke <julian.reschke@gmx.de> wrote:
> On 03.04.2011 20:06, Adam Barth wrote:
>> On Sun, Apr 3, 2011 at 5:48 AM, Larry Masinter<masinter@adobe.com>  wrote:
>>> A scheme registration defines the syntax for URIs (IRIs) that are valid
>>> for the scheme.  A syntax definition can include limits -- that some strings
>>> are valid for the scheme and other strings are not. Those limits can be
>>> complicated, limit the repertoire of characters, be expressed in BNF, and
>>> can include length limits.
>>>
>>> Syntactic restrictions should be justified, usually by the limits of the
>>> resolution mechanism or protocol associated with a string. And we should
>>> disallow any limits (or any other syntactic restrictions) that treat %-hex
>>> encoded UTF8 characters differently than their unicode character
>>> equivalents.
>>
>> That doesn't seem correct.  For example, the http scheme treats %-hex
>> encoded UTF8 characters differently than their unicode character
>> equivalents in some cases.  Consider:
>>
>> http://example.com/foo?bar
>> http://example.com/foo%3Fbar
>>
>>> document.body.innerHTML = "<a
>>> href='http://example.com/foo%3Fbar'>boo</a>"
>>> document.body.firstChild.pathname
>>
>> "/foo%3Fbar"
>>
>>> document.body.innerHTML = "<a href='http://example.com/foo?bar'>boo</a>"
>>> document.body.firstChild.pathname
>>
>> "/foo"
>> ...
>
> No news. "?" is special in URI parsing, thus it needs to be escaped when
> it's not meant to start a query component.

Yeah, I'm not saying that behavior is surprising.  I'm saying that
Larry's requirement is violated even for very commonly used schemes.

Adam
Received on Sunday, 3 April 2011 20:28:52 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 30 April 2012 19:52:01 GMT