Re: Change definition of URL to normatively reference IRI specification using a well-defined interface from Ian Hickson on 2010-04-06 (public-html@w3.org from April 2010)

From: Ian Hickson <ian@hixie.ch>
Date: Tue, 6 Apr 2010 13:10:47 -0700
To: Julian Reschke <julian.reschke@gmx.de>
Cc: public-html@w3.org
Message-ID: <j2r403d38e21004061310k89cba9edj655181d7927d47c5@mail.gmail.com>
On Tue, Apr 6, 2010 at 11:11 AM, Julian Reschke <julian.reschke@gmx.de> wrote:
>>> 2) Why would IRIbis need to define<hostport>?
>>
>> It is useful for defining HTML's APIs. The idea here is to extract the
>> parsing rules from the HTML spec.
>
> Why would the IRI spec define a term that is solely used in an HTML API.

For convenience and helpfulness.


> Why not declare it where it's needed?

The whole point here is to not have to define how you parse URLs in
the HTML spec.


>>> 3) Similarly, why would IRIbis defined<host-specific>? This one doesn't
>>> seem to be used at all.
>>
>> It's used by the postMessage draft. (Missing this kind of thing is the
>> danger of splitting the HTML5 spec. I highly recommend using the
>> complete.html version of the spec when searching for impact of things like
>
> I'm interested in what the W3C calls the "HTML5" specification, not any
> compound documents you may be publishing.

Please check all the documents the W3C is publishing or planning to
publish, not just one.


> Furthermore, "host-specific" is really a strange name for a component; I had
> no idea what it is before I checked.

I'm quite happy for the term to be called whatever Larry and Martin
think is best.


>>>>      - the output of the algorithm must be idempotent even if the base
>>>>        argument is changed (i.e. once resolved, resolving it again with
>>>>        the same character encoding cannot change the result)
>>>
>>> I don't believe "idempotent" is the right term here, if you do a second
>>> invocation with different arguments. Please elaborate, maybe give an
>>> example?
>>
>> "http://example.com##" is absolute, because regardless of the "B"
>> argument, the output is the same.
>
> Resolving an absolute string against any base URL is a NOP, yes. What has
> that to do with idempotency?

So long as we agree on what the requirement is, I am happy for you to
use whatever terminology you want to describe it.


>>>>      - resolving preserves errors, e.g. resolving "http://example.com##"
>>>>        returns "http://example.com/##" not "http://example.com/#%C3".
>>>>
>>>> Update the HTML spec to use these algorithms and reference the IRI
>>>> spec that defines them.
>>>
>>> It would be cool to understand why this is a requirement (I'm ready to
>>> believe it is in practice, I'd just like to see the reason...).
>>
>> The goal is consistency with shipped UAs. Whatever is consistent with UAs
>> is what we should do. I presume Larry and Martin will be be doing
>> extensive testing to be consistent with shipped UAs, and will respond to
>> UA feedback to be consistent with whatever they're willing to implement.
>
> I think what's important is *compatibility*. Are the shipping UAs consistent
> with respect to this? That was the data I was looking for.

I trust Larry and Martin will check for compatibility with legacy
documents when writing their parsing rules. That is what matters.


>> One thing I was thinking about last night is that it might be useful to
>> split the "resolve an address" algorithm into two, one to resolve an
>> address to ASCII output, and one to resolve an address to full-Unicode
>> output. We need the ASCII-only version so that we can extract the path for
>> use with e.g. HTTP, which doesn't support Unicode paths natively. I
>> haven't checked the specs I edit to see what else gets affected by this.
>
> How would that be different from parsing the IRI (I think calling something
> "resolve" when in fact nothing gets resolved is confusing) into the
> components, and then converting various parts (path & query) into ASCII?

If it's the same, that's fine by me. All I'm interested in here is
having an unambiguous way to reference an algorithm that does these
operations such that the HTML spec can reference them without (a)
defining things that the IETF feel should be defined in the IETF, (b)
having to be updated each time the IETF draft is updated, (c) having
to "fix" the algorithms to make them be what UAs are willing to
implement.

Thus the proposal of having an interface in the IETF space (presumably
in the IRI spec) that defines terms that HTML can use, without having
to return to the world of having to have HTML wrap the URI/IRI specs
in algorithms that "fix" the parsing and resolution rules to be
compatible with what UAs are implementing.

I've no interest in debating the specifics in this mailing list, since
the whole point of the exercise is to move the specifics to another
standards organisation altogether.

-- 
Ian Hickson
Received on Tuesday, 6 April 2010 20:11:19 UTC