W3C home > Mailing lists > Public > public-html@w3.org > April 2010

Re: Change definition of URL to normatively reference IRI specification using a well-defined interface

From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Date: Fri, 09 Apr 2010 17:00:35 +0900
Message-ID: <4BBEDEA3.3050805@it.aoyama.ac.jp>
To: Ian Hickson <ian@hixie.ch>
CC: Ted Hardie <ted.ietf@gmail.com>, Maciej Stachowiak <mjs@apple.com>, Larry Masinter <LMM@acm.org>, Julian Reschke <julian.reschke@gmx.de>, Marc Blanchet <Marc.Blanchet@viagenie.ca>, Sam Ruby <rubys@intertwingly.net>, Paul Cotton <Paul.Cotton@microsoft.com>, Michel SUIGNARD <Michel@suignard.com>, public-html <public-html@w3.org>, "public-iri@w3.org" <public-iri@w3.org>
Hello Ian,

Many thanks for your very careful description of the issues below. I 
propose (to the IRI WG chairs) that we replace the current issue 1 in 
our tracker with these two issues.

More comments below.

On 2010/04/09 10:40, Ian Hickson wrote:
> On Thu, Apr 8, 2010 at 9:31 AM, Ted Hardie<ted.ietf@gmail.com>  wrote:
>>
>> my understanding is that the correct next step will be to describe this issue
>> in a way that we can track.
>
> I've tried to write descriptions of the two issues. Please let me know
> if you need any further advice on the matter.
>
> Issue 1:
> ========================================================================
> Update the IRI specification to define an algorithm with the following
> characteristics:

In order to make it easier to understand this for people who are not 
deeply involved in the HTML5 effort, I'd like to confirm that this is 
the algorithm that HTML5 uses to split an URI/IRI into various 
components, each of which is then accessible via a (Javascript) DOM API 
function. So I guess the title of our issue should be something like:
"Ensure that the IRI spec defines how to split an IRI into components in 
a way that's referencable by the HTML5 spec" or some such.

>    Input:
>      * a string
>
>    Output:
>      * a boolean representing whether the algorithm succeeded or failed
>      * if the algorithm succeeded, one or more strings corresponding to
>        the following components, each of which may be present or absent:
>        -<scheme>  component
>        -<host>  component
>        -<port>  component
>        -<hostport>  component
>        -<path>  component
>        -<query>  component
>        -<fragment>  component
>        -<host-specific>  component
>
> This algorithm must be such that it can be used where HTML5 says "the
> user agent must use the parse an address algorithm defined by the IRI
> specification" in a manner that user agents including major browser
> vendors will be willing to implement the algorithm as written.
>
> Exactly what this algorithm must do is a matter that will need careful
> research, reverse-engineering existing UAs.

My understanding was that a lot of this research had already been done, 
and that we would basically try to match whatever was in the HTML5 spec 
before Dan Connolly and Michael Sperberg-McQueen extracted it into a 
separate draft. Of course, we should always be open to new information 
coming up, but your sentence above sounds much more like we have to 
start anew. Can you clarify?

> The algorithm needs to be defined in such a way that it can be
> referenced unambiguously by name. For example, text such as the
> following could be used to introduce this algorithm:
>
>     When a specification says that a user agent is to *parse an
>     address*, given a string INPUT, it must run the following steps,
>     which return a failure/success condition and a set of components:
>
>      ...
>
> This gives a completely unambiguous and clear way to invoke the
> algorithm described in the spec, along with RFC2119-level clarity
> regarding what such invokations imply for the user agent.
> ========================================================================
>
> Issue 2:
> ========================================================================
> Update the IRI specification to define an algorithm with the following
> characteristics:

Again to clarify here, if I understand correctly, the HTML5 spec needs 
such an algorithm to resolve relative references with respect to a base 
URI (my wild guess is that B is the base, and A is the relative URI 
below, can you confirm)?

Regards,    Martin.


>    Input:
>      * a string A
>      * a string B, which was previously output from this algorithm
>      * a character encoding
>
>    Output:
>      * a boolean representing whether the algorithm succeeded or failed
>      * if the algorithm succeeded, a string
>
> This algorithm must be such that it can be used where HTML5 says "the
> result of applying the resolve an address algorithm defined by the IRI
> specification to resolve url relative to base using encoding encoding"
> in a manner that user agents including major browser vendors will be
> willing to implement the algorithm as written.
>
> Exactly what this algorithm must do is a matter that will need careful
> research, reverse-engineering existing UAs.
>
> The algorithm needs to be defined in such a way that it can be
> referenced unambiguously by name. For example, text such as the
> following could be used to introduce this algorithm:
>
>     When a specification says that a user agent is to *resolve an
>     address", given a string INPUT, a second string BASE, and a
>     character encoding ENCODING, it must run the following steps, which
>     return a failure/success condition and a string:
>
>      ..."
>
> This gives a completely unambiguous and clear way to invoke the
> algorithm described in the spec, along with RFC2119-level clarity
> regarding what such invokations imply for the user agent.
> ========================================================================


Regards,    Martin.

-- 
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp
Received on Friday, 9 April 2010 08:01:24 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 May 2012 00:17:07 GMT