Re: Change definition of URL to normatively reference IRI specification using a well-defined interface from Maciej Stachowiak on 2010-04-09 (public-iri@w3.org from April 2010)

From: Maciej Stachowiak <mjs@apple.com>
Date: Fri, 09 Apr 2010 01:13:15 -0700
To: ""Martin J. Dürst"" <duerst@it.aoyama.ac.jp>
Cc: Ian Hickson <ian@hixie.ch>, Ted Hardie <ted.ietf@gmail.com>, Larry Masinter <LMM@acm.org>, Julian Reschke <julian.reschke@gmx.de>, Marc Blanchet <Marc.Blanchet@viagenie.ca>, Sam Ruby <rubys@intertwingly.net>, Paul Cotton <Paul.Cotton@microsoft.com>, Michel SUIGNARD <Michel@suignard.com>, public-html <public-html@w3.org>, "public-iri@w3.org" <public-iri@w3.org>
Message-id: <B2400A9A-98C5-4971-B734-66B889BC41C4@apple.com>

On Apr 9, 2010, at 1:00 AM, Martin J. Dürst wrote:

> Hello Ian,
>
> Many thanks for your very careful description of the issues below. I  
> propose (to the IRI WG chairs) that we replace the current issue 1  
> in our tracker with these two issues.
>
> More comments below.

I tried to answer your questions to the best of my abilities.

>
> On 2010/04/09 10:40, Ian Hickson wrote:
>> On Thu, Apr 8, 2010 at 9:31 AM, Ted Hardie<ted.ietf@gmail.com>   
>> wrote:
>>>
>>> my understanding is that the correct next step will be to describe  
>>> this issue
>>> in a way that we can track.
>>
>> I've tried to write descriptions of the two issues. Please let me  
>> know
>> if you need any further advice on the matter.
>>
>> Issue 1:
>> = 
>> = 
>> = 
>> =====================================================================
>> Update the IRI specification to define an algorithm with the  
>> following
>> characteristics:
>
> In order to make it easier to understand this for people who are not  
> deeply involved in the HTML5 effort, I'd like to confirm that this  
> is the algorithm that HTML5 uses to split an URI/IRI into various  
> components, each of which is then accessible via a (Javascript) DOM  
> API function. So I guess the title of our issue should be something  
> like:
> "Ensure that the IRI spec defines how to split an IRI into  
> components in a way that's referencable by the HTML5 spec" or some  
> such.

That's the primary purpose, yes.

>
>>   Input:
>>     * a string
>>
>>   Output:
>>     * a boolean representing whether the algorithm succeeded or  
>> failed
>>     * if the algorithm succeeded, one or more strings corresponding  
>> to
>>       the following components, each of which may be present or  
>> absent:
>>       -<scheme>  component
>>       -<host>  component
>>       -<port>  component
>>       -<hostport>  component
>>       -<path>  component
>>       -<query>  component
>>       -<fragment>  component
>>       -<host-specific>  component
>>
>> This algorithm must be such that it can be used where HTML5 says "the
>> user agent must use the parse an address algorithm defined by the IRI
>> specification" in a manner that user agents including major browser
>> vendors will be willing to implement the algorithm as written.
>>
>> Exactly what this algorithm must do is a matter that will need  
>> careful
>> research, reverse-engineering existing UAs.
>
> My understanding was that a lot of this research had already been  
> done, and that we would basically try to match whatever was in the  
> HTML5 spec before Dan Connolly and Michael Sperberg-McQueen  
> extracted it into a separate draft. Of course, we should always be  
> open to new information coming up, but your sentence above sounds  
> much more like we have to start anew. Can you clarify?

I think the old contents of HTML5, or even the content of the now  
abandoned Web Addresses spec, would be a good starting point. However,  
I believe that both Web Addresses and the old spec have bugs. New  
testing would be advisable to confirm some of the details and check  
edge cases.


>
>> The algorithm needs to be defined in such a way that it can be
>> referenced unambiguously by name. For example, text such as the
>> following could be used to introduce this algorithm:
>>
>>    When a specification says that a user agent is to *parse an
>>    address*, given a string INPUT, it must run the following steps,
>>    which return a failure/success condition and a set of components:
>>
>>     ...
>>
>> This gives a completely unambiguous and clear way to invoke the
>> algorithm described in the spec, along with RFC2119-level clarity
>> regarding what such invokations imply for the user agent.
>> = 
>> = 
>> = 
>> =====================================================================
>>
>> Issue 2:
>> = 
>> = 
>> = 
>> =====================================================================
>> Update the IRI specification to define an algorithm with the  
>> following
>> characteristics:
>
> Again to clarify here, if I understand correctly, the HTML5 spec  
> needs such an algorithm to resolve relative references with respect  
> to a base URI (my wild guess is that B is the base, and A is the  
> relative URI below, can you confirm)?

Specifically, String A is a possibly-relative URI (really a possibly- 
relative IRI reference with lenient Web Address processing), and  
String B is an absolute URI that is the base. String A is resolved  
against String B as a base, though if String A happens to be absolute,  
then A itself will be returned.


>
> Regards,    Martin.
>
>
>>   Input:
>>     * a string A
>>     * a string B, which was previously output from this algorithm
>>     * a character encoding
>>
>>   Output:
>>     * a boolean representing whether the algorithm succeeded or  
>> failed
>>     * if the algorithm succeeded, a string
>>
>> This algorithm must be such that it can be used where HTML5 says "the
>> result of applying the resolve an address algorithm defined by the  
>> IRI
>> specification to resolve url relative to base using encoding  
>> encoding"
>> in a manner that user agents including major browser vendors will be
>> willing to implement the algorithm as written.
>>
>> Exactly what this algorithm must do is a matter that will need  
>> careful
>> research, reverse-engineering existing UAs.
>>
>> The algorithm needs to be defined in such a way that it can be
>> referenced unambiguously by name. For example, text such as the
>> following could be used to introduce this algorithm:
>>
>>    When a specification says that a user agent is to *resolve an
>>    address", given a string INPUT, a second string BASE, and a
>>    character encoding ENCODING, it must run the following steps,  
>> which
>>    return a failure/success condition and a string:
>>
>>     ..."
>>
>> This gives a completely unambiguous and clear way to invoke the
>> algorithm described in the spec, along with RFC2119-level clarity
>> regarding what such invokations imply for the user agent.
>> = 
>> = 
>> = 
>> =====================================================================
>
>
> Regards,    Martin.
>
> -- 
> #-# Martin J. Dürst, Professor, Aoyama Gakuin University
> #-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp
>

Received on Friday, 9 April 2010 08:13:50 UTC