Re: Proposed Charter and Agenda for IRI BOF at IETF 76 from Maciej Stachowiak on 2009-09-26 (public-iri@w3.org from September 2009)

From: Maciej Stachowiak <mjs@apple.com>
Date: Sat, 26 Sep 2009 16:37:47 -0700
To: Mark Nottingham <mnot@mnot.net>
Cc: "Roy T. Fielding" <fielding@gbiv.com>, Larry Masinter <masinter@adobe.com>, "PUBLIC-IRI@W3.ORG" <PUBLIC-IRI@w3.org>
Message-id: <0D37D7F2-8092-40C7-B2C3-A160116698B6@apple.com>
On Sep 26, 2009, at 3:04 PM, Mark Nottingham wrote:

> I agree with Roy, and would add that this document hides the new  
> information (i.e., how to get from random bits to a valid URI or  
> IRI) too deeply; for example, if HTTPbis wanted to reference this  
> thing, it would need to do so by specifying a section in the IRI  
> spec, even though HTTP doesn't use IRIs at all.
>
> What I'd like to see is:
>  a. A revision of the IRI spec (if necessary), and
>  b. A new spec defining how to get from random bits to a URI or an  
> IRI (allowing the application to choose which one it needs to end up  
> with).
>
> Then, different specs can refer to URIs if they want to, IRIs if  
> they want to, and optionally specify this processing as a step  
> beforehand, and do so clearly.

It seems like the only difference between your proposal and Larry's is  
whether strict processing of IRIs and lenient processing of strings  
that may or may not be valid IRIs are in the same spec or two separate  
specs.

I don't see a great advantage in splitting the specs, as this makes  
cross-references more complicated. If it were just a matter of  
transforming the kind of string that may appear in an "href" attribute  
into a valid IRI, then your proposal might be plausible. However, in  
addition to converting to a URI, HTML UAs also need to be able to do  
the following to resource identifiers treated with lenient processing:  
(a) separate into components, even when the string is not a valid URI  
or IRI, and in a way that is not necessarily equivalent to first  
converting to a valid IRI or URI; (b) resolve a reference relative to  
a base when either the reference or the base might not be a valid URI  
or IRI; (c) determine if a reference is "absolute" even if it might  
not be a valid URI or IRI. That would mean a great deal of algorithms  
defined in a totally separate place from the URI spec. This is what  
the Web Address spec[1] attempted to do, and it ends up duplicating a  
lot of concepts from IRI/URI. This effort was set aside in favor of  
IRIbis incorporating the necessary content.

I do agree that the lenient processing rules are hidden too deeply in  
Larry's current draft, but I do not think this is intrinsic to the  
content being in a single document.

Note: it's not clear to me why HTTPbis would want to reference lenient  
processing rules for URIs/IRIs. Are HTTP servers and proxies not  
strict in what they accept?

Regards,
Maciej

[1] http://www.w3.org/html/wg/href/draft.html

>
>
> On 26/09/2009, at 5:19 AM, Roy T. Fielding wrote:
>
>> Larry, your changes to the IRI draft make it incomprehensible.
>>
>> I think this is getting ridiculous.  We don't need a working group.
>> We don't even need an updated draft, at least not for LEIRI, Href,
>> and whatever it is that we call HTML5 references.
>>
>> HTML5 wants to specify the *process* of taking arbitrary data entry
>> in various places and transforming it into a) something the browser
>> displays, and b) a URI for use on the wire.  What they are calling  
>> URL
>> is the arbitrary data entry part, NOT the resulting URI, which is why
>> it is so frigging annoying and inconsistent with all other standards.
>> LEIRI made the same mistake.
>>
>> The purpose of IRI is to specify the allowed syntax for what one
>> might see on the side of a bus as a Web address in i18n-friendly,
>> human-readable form.  That is why the IRI syntax does not allow
>> common delimiters like whitespace, quotes, and brackets (except
>> for IPv6 literals).  It does not define a data-entry box.
>>
>> URI is in the same boat, except that it also defines the allowed
>> syntax for on-the-wire usage in HTTP, etc.  It is intentionally
>> limited for use in embedded plain text.  It does not define a
>> data entry box.
>>
>> Both IRI and URI are intended to define standards for the Internet
>> in the same way as the US Postal Service residential addresses have
>> a standard normal form.  The fact that an envelope does not prevent
>> a person from writing an arbitrary form of address in the hope that
>> a mail carrier can interpret it for them is not an indication that
>> the standard is somehow "wrong" -- what matters is that following
>> the standard is known to be interoperable, and everything else is
>> just an experiment in forgiveness.
>>
>> What HTML5 wants to define is how to process a data entry box
>> in the same way across all browser implementations, and there is
>> nothing wrong with such a definition appearing in HTML5 *except*
>> for the fact that the editor has chosen an existing well-known
>> term that means something else to describe it, which conflicts
>> with all prior uses of that term.  Just stop that nonsense by
>> changing the HTML5 draft wording to talk about references, not URLs.
>> HTML5 does not require changes to IRI, and certainly not to URI.
>>
>> Changing IRI (or URI) so that it conforms both to the side of a
>> bus definition and a data entry definition is insane.  They are
>> not the same thing.  They do not share the same concerns.  A
>> reference might allow anything, depending on its context and the
>> technology used to parse it; it is the post-processing that
>> produces an IRI/URI.
>>
>> ....Roy
>>
>
>
> --
> Mark Nottingham     http://www.mnot.net/
>
>
Received on Saturday, 26 September 2009 23:38:28 UTC