Re: Change definition of URL to normatively reference IRI specification using a well-defined interface from Julian Reschke on 2010-04-06 (public-html@w3.org from April 2010)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Tue, 06 Apr 2010 14:26:07 +0200
To: Ian Hickson <ian@hixie.ch>
CC: public-html@w3.org
Message-ID: <4BBB285F.2020906@gmx.de>

Ah, progress! Thanks for that. Comments inline.

On 06.04.2010 01:57, Ian Hickson wrote:
>
> ISSUE-56
> ========
>
> SUMMARY
> The HTML specification is changed slightly to reference the IRI
> specification using a well-defined interface.
>
> RATIONALE
> To ensure a clean modular separation of the IRI and HTML specifications,
> an interface is needed. This allows the specifications to co-exist in a
> well-defined way without each specification needing to be continually
> updated as the other is fixed (for example, changing references to section
> numbers or step numbers).
>
> DETAILS
> Update the IRI specification to define two algorithms:
>
>   * parsing an address (relative or absolute): algorithm to obtain a
>     failure/success condition (not the same as whether the input is
>     valid or not, just whether it can be parsed), and the following
>     components, from parsing an arbitrary string:
>      -<scheme>  component
>      -<host>  component
>      -<port>  component
>      -<hostport>  component
>      -<path>  component
>      -<query>  component
>      -<fragment>  component
>      -<host-specific>  component

1) I believe you want that algorithm to parse and return the individual 
components even for invalid IRIs, right? If so, this should be pointed out.

2) Why would IRIbis need to define <hostport>? This seems to be useful 
in HTML5 only.

3) Similarly, why would IRIbis defined <host-specific>? This one doesn't 
seem to be used at all.

>   * resolving an address A relative to a base address B with an encoding C:
>     algorithm for parsing an arbitrary string A and resolving it relative
>     to address B (which will have been resolved, but may be invalid), using
>     a specified character encoding C, and returning either success or
>     failure, and in the case of success, a string, with the following
>     conditions:
>      - the output of the algorithm must be idempotent even if the base
>        argument is changed (i.e. once resolved, resolving it again with
>        the same character encoding cannot change the result)

I don't believe "idempotent" is the right term here, if you do a second 
invocation with different arguments. Please elaborate, maybe give an 
example?

>      - resolving preserves errors, e.g. resolving "http://example.com##"
>        returns "http://example.com/##" not "http://example.com/#%C3".
>
> Update the HTML spec to use these algorithms and reference the IRI spec
> that defines them.

It would be cool to understand why this is a requirement (I'm ready to 
believe it is in practice, I'd just like to see the reason...).

Best regards, Julian

Received on Tuesday, 6 April 2010 12:26:42 UTC