Re: Non-hierarchical base URLs (was Re: draft-abarth-url-01 uploaded) from Adam Barth on 2011-05-03 (public-iri@w3.org from May 2011)

From: Adam Barth <ietf@adambarth.com>
Date: Mon, 2 May 2011 18:08:49 -0700
To: Maciej Stachowiak <mjs@apple.com>
Cc: "Roy T. Fielding" <fielding@gbiv.com>, public-iri@w3.org
Message-ID: <BANLkTikk9b9RAAMKTMten21FVpJ3iLDbSg@mail.gmail.com>

On Mon, May 2, 2011 at 5:57 PM, Maciej Stachowiak <mjs@apple.com> wrote:
> On May 2, 2011, at 5:42 PM, Adam Barth wrote:
>> On Mon, May 2, 2011 at 4:33 PM, Roy T. Fielding <fielding@gbiv.com> wrote:
>>> Authors have been using plain old ASCII references to URIs for
>>> longer than the Web has been documented.  We expect them to
>>> still work.  Likewise for references that are in the document
>>> encoding but only use the subset of characters that are found
>>> in ASCII.  URIs are defined in terms of characters, not octets,
>>> so the transcoding I am referring to is the removal of whitespace,
>>> pct-encoding of non-unreserved characters, etc.  A reference that
>>> is already in URI form does not need to be transcoded.
>>
>> You're missing the constraint that browser vendors aren't going to
>> change their implementations to align with this dream.  Our choice is
>> between having the specification reflect that reality or having the
>> spec tell a lie.
>
> Are there specific cases where browser URL resolution for an all-ASCII string that matches the valid URI grammar does not match what the RFC says? (There may be some, but I don't specifically know of any).

Yes.  One example, is included at the beginning of the thread:

<base href="data://foo/bar?baz#qux">
<a href="taco.html">hello</a>
<script>
alert(document.getElementsByTagName('a')[0].href)
</script>

There are also examples related to the different classes of URLs that
Boris mentioned.  One of the reasons that browsers have different
classes of URLs is because parsing and resolving relative references
isn't uniform across different schemes.  Another problematic area is
the treatment of fragments because the RFCs claim that fragments
behave uniformly across URL schemes, which isn't true either.

I'd prefer to live in a world where that wasn't true because it causes
problems when folks introduce new URLs schemes, and I suspect that
folks who don't feel as constrained as browser vendors could decide to
live in that world without feeling too much pain.  Unfortunately, I
believe this constraint is real for some folks.

Adam

Received on Tuesday, 3 May 2011 01:09:47 UTC