Non-hierarchical base URLs (was Re: draft-abarth-url-01 uploaded)

On Mon, Apr 25, 2011 at 12:27 AM, Julian Reschke <julian.reschke@gmx.de> wrote:
> On 24.04.2011 20:10, Adam Barth wrote:
>> Finding the scheme aborts the "finding the scheme" algorithm (hence
>> the separate section and the phrase "these steps") and reports that
>> the URL is invalid when there is no scheme.  The algorithm for
>> resolving a relative URL then continues down this branch "if
>> relative-url is an invalid URL...".
>
> Got it, thanks.
>
> So, 4.1 says:
>
>>   TODO: If base-url's scheme is not hierarchical, we can't resolve as a
>>   relative URL.  We'll probably want to return an invalid URL.  Check
>>   what happens when resolving an empty string as a relative URL with a
>>   non-hierarchical base.
>
> If you look at RFC 3986 you will see that there's no problem like that. Both
> URIs and relative references are parsed into components, and the resolution
> algorithm doesn't care where they came from, and has no extra knowledge of
> "hierarchical" or specific schemes.

I don't believe you can correctly account for the behavior of existing
browsers without classifying schemes into at least two categories.
For the purposes of discussion, let's call those two categories
hierarchical and non-hierarchical (but of course we could use whatever
names we like).

What does the following HTML alert?

<base href="data://foo/bar?baz#qux">
<a href="taco.html">hello</a>
<script>
alert(document.getElementsByTagName('a')[0].href)
</script>

What about the following?

<base href="http://foo/bar?baz#qux">
<a href="taco.html">hello</a>
<script>
alert(document.getElementsByTagName('a')[0].href)
</script>

The facts are that URL handling in browser is not uniform across
schemes.  We might feel happy or sad about that, but that's how things
are.  If we're going to write specs that tell the truth, then we need
to acknowledge these infelicities rather than sticking our heads in
the sand and pretending they aren't the case.

> I understand what you're doing - rephrasing existing code that you think is
> good into a spec - but I just don't see how this is helpful if we can't
> compare the outcome with what the existing specs say.

We can compare the outcome via the test suite and by using our human
understanding.  Your position seems to be that the only way to write a
new URL spec is to edit an existing one so that we can understand how
the two differ.  That's certainly one approach, but it's hardly the
only approach.

> Can we please focus on a test suite first (and yes, I'm doing my homework as
> well; if you can supply a set of test URIs in machine-readable format that
> would be super-helpful).

I've been working on a test suite for a while:

https://trac.webkit.org/browser/trunk/LayoutTests/fast/url/

Some other members of the working group have already contributed a
number of additional test case.  You'll find that all the test URLs in
that test suite are machine readable.  Rather than continue to discuss
process, would you like to contribute some additional test cases?

Adam

Received on Monday, 25 April 2011 07:51:26 UTC