Re: scheme-specific length limits (issue 48)

On Sun, Apr 3, 2011 at 1:05 PM, Julian Reschke <julian.reschke@gmx.de> wrote:
> On 03.04.2011 20:06, Adam Barth wrote:
>> On Sun, Apr 3, 2011 at 5:48 AM, Larry Masinter<masinter@adobe.com>  wrote:
>>> A scheme registration defines the syntax for URIs (IRIs) that are valid
>>> for the scheme.  A syntax definition can include limits -- that some strings
>>> are valid for the scheme and other strings are not. Those limits can be
>>> complicated, limit the repertoire of characters, be expressed in BNF, and
>>> can include length limits.
>>>
>>> Syntactic restrictions should be justified, usually by the limits of the
>>> resolution mechanism or protocol associated with a string. And we should
>>> disallow any limits (or any other syntactic restrictions) that treat %-hex
>>> encoded UTF8 characters differently than their unicode character
>>> equivalents.
>>
>> That doesn't seem correct.  For example, the http scheme treats %-hex
>> encoded UTF8 characters differently than their unicode character
>> equivalents in some cases.  Consider:
>>
>> http://example.com/foo?bar
>> http://example.com/foo%3Fbar
>>
>>> document.body.innerHTML = "<a
>>> href='http://example.com/foo%3Fbar'>boo</a>"
>>> document.body.firstChild.pathname
>>
>> "/foo%3Fbar"
>>
>>> document.body.innerHTML = "<a href='http://example.com/foo?bar'>boo</a>"
>>> document.body.firstChild.pathname
>>
>> "/foo"
>> ...
>
> No news. "?" is special in URI parsing, thus it needs to be escaped when
> it's not meant to start a query component.

Yeah, I'm not saying that behavior is surprising.  I'm saying that
Larry's requirement is violated even for very commonly used schemes.

Adam

Received on Sunday, 3 April 2011 20:28:52 UTC