Re: Non-hierarchical base URLs (was Re: draft-abarth-url-01 uploaded) from Maciej Stachowiak on 2011-05-03 (public-iri@w3.org from May 2011)

From: Maciej Stachowiak <mjs@apple.com>
Date: Tue, 03 May 2011 11:00:55 -0700
To: Julian Reschke <julian.reschke@gmx.de>
Cc: "Roy T. Fielding" <fielding@gbiv.com>, Adam Barth <ietf@adambarth.com>, public-iri@w3.org
Message-id: <A57C0F1A-1B2C-4ABE-9F6A-921DC49D1228@apple.com>
On May 3, 2011, at 1:50 AM, Julian Reschke wrote:

> On 03.05.2011 04:55, Maciej Stachowiak wrote:
>> 
>> On May 2, 2011, at 7:24 PM, Roy T. Fielding wrote:
>> 
>>> On May 2, 2011, at 6:26 PM, Adam Barth wrote:
>>> 
>>>> On Mon, May 2, 2011 at 6:24 PM, Roy T. Fielding<fielding@gbiv.com>  wrote:
>>>>> On May 2, 2011, at 5:42 PM, Adam Barth wrote:
>>>>>> You're missing the constraint that browser vendors aren't going to
>>>>>> change their implementations to align with this dream.
>>>>> 
>>>>> There is no such constraint.  Real browser developers like to fix
>>>>> bugs when they are found, particularly when it makes their behavior
>>>>> more interoperable with existing content.
>>>> 
>>>> Perhaps you missed this message:
>>>> 
>>>> On Mon, Apr 25, 2011 at 1:38 AM, Maciej Stachowiak<mjs@apple.com>  wrote:
>>>>> On Apr 25, 2011, at 1:27 AM, Julian Reschke wrote:
>>>>>> Actually, Safari *does* the right thing here.
>>>>> 
>>>>> Safari has serious bugs as a result of doing the RFC-compliant thing here. We plan to change to be more like other browsers.
>>>>> 
>>>>> Regards,
>>>>> Maciej
>>>> 
>>>> AFAIK, Maciej is about as "real" a browser developer as they come.
>>> 
>>> AFAICT, Maciej based that statement on memory instead of an actual
>>> use case or test, since Safari does parse URIs correctly and so does
>>> Firefox.  When we come up with an example that is "more like other
>>> browsers" and is still broken, then we can talk about how to fix it.
>>> 
>>> And when we do, all implementations will be taken into consideration.
>> 
>> The specific context of my statement is bugs I looked at fairly recently, but which unfortunately i cannot explain in detail because some of them have serious security consequences and they are as yet unpatched.
>> 
>> In the course of working on some of these bugs, I came to the conclusion that Safari should abandon scheme-independent URL parsing, as most other browsers hardcode knowledge of certain schemes as hierarchical and this seems to result in better real-world compatibility,
>> 
>> I am skeptical of the example where a data: URI is the base URI for a relative reference; while the behavior for this must defined one way or another, I would not expect there to be Web content that depends on a specific choice of behavior here, because (a) data: URLs are rare on the Web; and (b) there's almost nothing sensible you can do in this case. Note that Adam just used<base>  for convenience, the example could just as well have been written as an actual data: URL which would then act as the anchor for URLs inside the body. But there are more realistic cases where a relative URL may be resolved against a base of a non-hierarchical URI scheme, e.g.:
>> 
>> <iframe id=foo src="about://blank" onload="test()"></iframe>
>> <script>
>> var doc = document.getElementById("foo").contentDocument;
>> var anchor = doc.createElement("a");
>> anchor.setAttribute("href", "foo.html")
>> doc.body.appendChild(anchor);
>> alert(anchor.href);
>> </script>
>> ...
> 
> FF4: resolves against the document's URI (not about://blank)
> IE9: doesn't load the iframe
> Opera: as FF4
> Chrome: alert shows nothing
> Safari: resolves against the about URI
> 
> So, again, no interop whatsoever.

Hi Julian,

My example was meant to illustrate a case where relative resolution against a non-hierarchical URI scheme may actually come up in Web content. I draw no conclusions about whether any specific behavior is required for Web content. Although testing 5 browsers and getting 4 different answers implies to me that we really need a clearly defined behavior for this case. If you would like to see an example of relative resolution against a non-hierarchical URI that has full interop, try this:

<iframe id=foo src="about:blank" onload="test()"></iframe>
<script>
var doc = document.getElementById("foo").contentDocument;
var anchor = doc.createElement("a");
anchor.setAttribute("href", "foo.html")
doc.body.appendChild(anchor);
alert(anchor.href);
</script>

I believe you will consistently get resolution against the URL of the parent document. I am reasonably confident cases like this *do* affect Web compatibility, though the deviation here is outside the scope of URL parsing itself.

Anyway, this is why I originally asked whether any deviation from RFC processing for valid all-ASCII URIs is required by Web compatibility. I am less confident than Adam that it is in fact required. I *am* confident that URIs that are invalid per the grammar or contain non-ASCII characters need to deviate from what IRI says, however, even in cases where they are valid IRIs.

Regards,
Maciej
Received on Tuesday, 3 May 2011 18:01:50 UTC