Re: parsing URI (references) according to RFC 3986 from Adam Barth on 2011-06-20 (public-iri@w3.org from June 2011)

From: Adam Barth <ietf@adambarth.com>
Date: Mon, 20 Jun 2011 01:47:53 -0700
To: Julian Reschke <julian.reschke@gmx.de>
Cc: Chris Weber <chris@lookout.net>, Boris Zbarsky <bzbarsky@mit.edu>, public-iri@w3.org
Message-ID: <BANLkTikZn+-+HofohD6=tHfO+AovE0ng6g@mail.gmail.com>

On Mon, Jun 20, 2011 at 1:13 AM, Julian Reschke <julian.reschke@gmx.de> wrote:
> On 2011-06-20 10:03, Adam Barth wrote:
>> Even just trivial things need to be cleaned up, like:
>>
>> http://ExAmple.CoM/
>
> What needs to be cleaned up here?

* FF canonicalize('http://GoOgLe.CoM/') is 'http://google.com/'
* IE canonicalize('http://GoOgLe.CoM/') is 'http://google.com/'
* KR canonicalize('http://GoOgLe.CoM/') is 'http://google.com/'
* SA canonicalize('http://GoOgLe.CoM/') should be http://google.com/.
Was http://GoOgLe.CoM/.

IE, Firefox, and Chrome convert host names to lower case.  Safari does not.

>> http://www.example.com/##asdf
>
> Either reject the reference as invalid, or treat this as a fragment with
> value "#asdf".
>
> *How* to handle fragments depends on media types, not URI parsing, so I'm
> not sure we should try to answer this here...

FF canonicalize('http://www.example.com/##asdf') is
'http://www.example.com/##asdf'
IE canonicalize('http://www.example.com/##asdf') is
'http://www.example.com/##asdf'
KR canonicalize('http://www.example.com/##asdf') is
'http://www.example.com/##asdf'
SA canonicalize('http://www.example.com/##asdf') should be
http://www.example.com/##asdf. Was http://www.example.com/#%23asdf.

The question is whether # occurring in the fragment should be coerced
to be %-escaped.  My reading of the evidence here says "no."

On Mon, Jun 20, 2011 at 1:34 AM, Julian Reschke <julian.reschke@gmx.de> wrote:
> On 2011-06-20 10:24, Chris Weber wrote:
>> 6) Handling percent-encoded values in various components
>
> Is there a *problem* related to this?
>
> I can see that the exposed DOM properties vary on how things are
> canonicalized, but that's a DOM issue, not a URI/IRI issue.

You can play games about who needs to spec this stuff, but it needs to
be specced.  In implementations, this work is done by the URL
processing code, not by the DOM processing code.

Adam

Received on Monday, 20 June 2011 08:48:52 UTC