Re: Non-hierarchical base URLs (was Re: draft-abarth-url-01 uploaded)

On May 2, 2011, at 6:08 PM, Adam Barth wrote:
> On Mon, May 2, 2011 at 5:57 PM, Maciej Stachowiak <mjs@apple.com> wrote:
>> On May 2, 2011, at 5:42 PM, Adam Barth wrote:
>>> On Mon, May 2, 2011 at 4:33 PM, Roy T. Fielding <fielding@gbiv.com> wrote:
>>>> Authors have been using plain old ASCII references to URIs for
>>>> longer than the Web has been documented.  We expect them to
>>>> still work.  Likewise for references that are in the document
>>>> encoding but only use the subset of characters that are found
>>>> in ASCII.  URIs are defined in terms of characters, not octets,
>>>> so the transcoding I am referring to is the removal of whitespace,
>>>> pct-encoding of non-unreserved characters, etc.  A reference that
>>>> is already in URI form does not need to be transcoded.
>>> 
>>> You're missing the constraint that browser vendors aren't going to
>>> change their implementations to align with this dream.  Our choice is
>>> between having the specification reflect that reality or having the
>>> spec tell a lie.
>> 
>> Are there specific cases where browser URL resolution for an all-ASCII string that matches the valid URI grammar does not match what the RFC says? (There may be some, but I don't specifically know of any).
> 
> Yes.  One example, is included at the beginning of the thread:
> 
> <base href="data://foo/bar?baz#qux">
> <a href="taco.html">hello</a>
> <script>
> alert(document.getElementsByTagName('a')[0].href)
> </script>

Which is obviously not a valid use case, since data scheme URIs
are not used as the base URI for anchor references and thus may
cause the parsing algorithm of HTML to override any such resolution
described by RFC3986.  See

  http://www.apps.ietf.org/rfc/rfc3986.html#sec-5.1.1

Nor is it reasonable to assume that the results of javascript
extraction of an element value via the DOM reflects how the
input was parsed, since a nonsensical base URI will result in
security blocks that are outside the scope of 3986.

> There are also examples related to the different classes of URLs that
> Boris mentioned.  One of the reasons that browsers have different
> classes of URLs is because parsing and resolving relative references
> isn't uniform across different schemes.  Another problematic area is
> the treatment of fragments because the RFCs claim that fragments
> behave uniformly across URL schemes, which isn't true either.

Please list them all using tests, like I did long ago at

   http://labs.apache.org/webarch/uri/test/
   http://labs.apache.org/webarch/uri/test/rel_examples1.html
   http://labs.apache.org/webarch/uri/test/rel_examples2.html
   http://labs.apache.org/webarch/uri/test/rel_examples3.html
   http://labs.apache.org/webarch/uri/test/rel_examples4.html
   http://labs.apache.org/webarch/uri/test/rel_examples5.html

I just checked and Firefox 4 passes all of the tests, much
better than prior Mozilla-based browsers, except for the ones
involving unrecognized schemes (for which the parsing may be
correct but the browser does not render them as a link,
which is fine).  Safari 5.0.5 does even better.

> I'd prefer to live in a world where that wasn't true because it causes
> problems when folks introduce new URLs schemes, and I suspect that
> folks who don't feel as constrained as browser vendors could decide to
> live in that world without feeling too much pain.  Unfortunately, I
> believe this constraint is real for some folks.

I'd prefer to live in a world based on actual tests and believable
use cases, not bizarre creations used to justify NIH syndrome.
The browser developers do conform to 3986, as far as I know,
because they wouldn't interoperate with offline tools and
content management systems if they didn't.

....Roy

Received on Tuesday, 3 May 2011 02:05:14 UTC