Re: Change definition of URL to normatively reference IRI specification using a well-defined interface from Julian Reschke on 2010-04-06 (public-html@w3.org from April 2010)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Tue, 06 Apr 2010 20:11:29 +0200
To: Ian Hickson <ian@hixie.ch>
CC: public-html@w3.org
Message-ID: <4BBB7951.8070706@gmx.de>
On 06.04.2010 19:48, Ian Hickson wrote:
> On Tue, 6 Apr 2010, Julian Reschke wrote:
>>>
>>> DETAILS
>>> Update the IRI specification to define two algorithms:
>>>
>>>    * parsing an address (relative or absolute): algorithm to obtain a
>>>      failure/success condition (not the same as whether the input is
>>>      valid or not, just whether it can be parsed), and the following
>>>      components, from parsing an arbitrary string:
>>>       -<scheme>   component
>>>       -<host>   component
>>>       -<port>   component
>>>       -<hostport>   component
>>>       -<path>   component
>>>       -<query>   component
>>>       -<fragment>   component
>>>       -<host-specific>   component
>>
>> 1) I believe you want that algorithm to parse and return the individual
>> components even for invalid IRIs, right? If so, this should be pointed out.
>
> The parenthetical points this out. Either way, I assume Larry and Martin
> are aware of this requirement, since otherwise there'd be no point in this
> exercise (it's basically the only change needed to the IRI specs).

OK.

>> 2) Why would IRIbis need to define<hostport>?
>
> It is useful for defining HTML's APIs. The idea here is to extract the
> parsing rules from the HTML spec.

Why would the IRI spec define a term that is solely used in an HTML API. 
Why not declare it where it's needed?

>> 3) Similarly, why would IRIbis defined<host-specific>? This one doesn't
>> seem to be used at all.
>
> It's used by the postMessage draft. (Missing this kind of thing is the
> danger of splitting the HTML5 spec. I highly recommend using the
> complete.html version of the spec when searching for impact of things like

I'm interested in what the W3C calls the "HTML5" specification, not any 
compound documents you may be publishing.

Furthermore, "host-specific" is really a strange name for a component; I 
had no idea what it is before I checked.

As in 2) I recommend that you just define it where you need it.

> ...
>>>    * resolving an address A relative to a base address B with an encoding C:
>>>      algorithm for parsing an arbitrary string A and resolving it relative
>>>      to address B (which will have been resolved, but may be invalid), using
>>>      a specified character encoding C, and returning either success or
>>>      failure, and in the case of success, a string, with the following
>>>      conditions:
>>>       - the output of the algorithm must be idempotent even if the base
>>>         argument is changed (i.e. once resolved, resolving it again with
>>>         the same character encoding cannot change the result)
>>
>> I don't believe "idempotent" is the right term here, if you do a second
>> invocation with different arguments. Please elaborate, maybe give an example?
>
> "http://example.com##" is absolute, because regardless of the "B"
> argument, the output is the same.

Resolving an absolute string against any base URL is a NOP, yes. What 
has that to do with idempotency?

>>>       - resolving preserves errors, e.g. resolving "http://example.com##"
>>>         returns "http://example.com/##" not "http://example.com/#%C3".
>>>
>>> Update the HTML spec to use these algorithms and reference the IRI
>>> spec that defines them.
>>
>> It would be cool to understand why this is a requirement (I'm ready to
>> believe it is in practice, I'd just like to see the reason...).
>
> The goal is consistency with shipped UAs. Whatever is consistent with UAs
> is what we should do. I presume Larry and Martin will be be doing
> extensive testing to be consistent with shipped UAs, and will respond to
> UA feedback to be consistent with whatever they're willing to implement.

I think what's important is *compatibility*. Are the shipping UAs 
consistent with respect to this? That was the data I was looking for.

> One thing I was thinking about last night is that it might be useful to
> split the "resolve an address" algorithm into two, one to resolve an
> address to ASCII output, and one to resolve an address to full-Unicode
> output. We need the ASCII-only version so that we can extract the path for
> use with e.g. HTTP, which doesn't support Unicode paths natively. I
> haven't checked the specs I edit to see what else gets affected by this.

How would that be different from parsing the IRI (I think calling 
something "resolve" when in fact nothing gets resolved is confusing) 
into the components, and then converting various parts (path & query) 
into ASCII?

Best regards, Julian
Received on Tuesday, 6 April 2010 18:12:14 UTC