Re: parsing URI (references) according to RFC 3986 from Adam Barth on 2011-06-20 (public-iri@w3.org from June 2011)

From: Adam Barth <ietf@adambarth.com>
Date: Sun, 19 Jun 2011 23:44:10 -0700
To: Julian Reschke <julian.reschke@gmx.de>
Cc: Chris Weber <chris@lookout.net>, "PUBLIC-IRI@W3.ORG" <PUBLIC-IRI@w3.org>
Message-ID: <BANLkTik1G4CiqBASRt345Am2ZRxH8W4sEw@mail.gmail.com>

On Sun, Jun 19, 2011 at 11:39 PM, Julian Reschke <julian.reschke@gmx.de> wrote:
> On 2011-06-20 02:10, Adam Barth wrote:
>>
>> On Sun, Jun 19, 2011 at 4:18 PM, Chris Weber<chris@lookout.net>  wrote:
>>>
>>> On 6/18/2011 6:09 AM, Adam Barth wrote:
>>>>
>>>> How does your implementation compare to existing browsers on this test
>>>> suite:
>>>>
>>>> http://trac.webkit.org/browser/trunk/LayoutTests/fast/url/
>>>>
>>>> In particular, it would be helpful to add entries for your
>>>> implementation to the following table so that we can see whether it
>>>> makes desirable trade-offs in situations where browsers differ in
>>>> behavior:
>>>>
>>>>
>>>> https://raw.github.com/abarth/url-spec/master/tests/gurl-results/by-browser.txt
>>>
>>> The Webkit test suite seems very valuable for its portability and
>>> black-box
>>> testing capability.  It does have some limitations though in that it's
>>> only
>>> considering the DOM and sometimes only certain properties therein.
>>>
>>> I still have a ways to go with my own test suite, but wanted to expand on
>>> some of test results.  I've used some of your same test cases where I
>>> can.
>>>
>>> IE canonicalize('http://example.com\\foo\\bar') is
>>> 'http://example.com/foo/bar'
>>> KR canonicalize('http://example.com\\foo\\bar') is
>>> 'http://example.com/foo/bar'
>>> SA canonicalize('http://example.com\\foo\\bar') is
>>> 'http://example.com/foo/bar'
>>> FF canonicalize('http://example.com\\foo\\bar') should be
>>> http://example.com/foo/bar. Was http://example.com\foo\bar/.
>>>
>>> In the above test results, you're comparing against the .href property of
>>> the DOM element, which is fine and may be all you want.  It may be
>>> interesting to note some more detail here though.
>>>
>>> FF hostname property for this test is "example.com\foo\bar".  Because
>>> it's
>>> an invalid hostname it fails to initiate an HTTP request for this URI and
>>> doesn't even try to make a DNS request (good).
>>>
>>> In a similar test case "http://example.com/foo\bar" both FF and Opera's
>>> path
>>> property in the DOM percent-encode the "\" as "/foo%5Cbar" and the
>>> corresponding HTTP request matches to become "GET /foo%5Cbar HTTP/1.1".
>>>  IE,
>>> Chrome, and Safari all instead convert the "\" to a "/".  Their DOM path
>>> property shows "/foo/bar" and the HTTP request matches as "GET /foo/bar
>>> HTTP/1.1".
>>
>> Indeed.  The point is that IE, Chrome, and Safari treat \ as if it
>> were / in parsing URLs whereas Firefox does not.  I suspect we'll want
>> the spec to say that \ should be treated like / when parsing URLs.
>
> ...breaking
>
>  data:text/plain,foo\bar
>
> ?

Please read the whole thread before responding.  We're talking about
hierarchal URLs, of which data is not.

Adam

Received on Monday, 20 June 2011 06:45:36 UTC