Re: Proposed Charter and Agenda for IRI BOF at IETF 76 from Mark Nottingham on 2009-09-27 (public-iri@w3.org from September 2009)

From: Mark Nottingham <mnot@mnot.net>
Date: Sun, 27 Sep 2009 10:01:13 +1000
To: Maciej Stachowiak <mjs@apple.com>
Cc: "Roy T. Fielding" <fielding@gbiv.com>, Larry Masinter <masinter@adobe.com>, "PUBLIC-IRI@W3.ORG" <PUBLIC-IRI@w3.org>
Message-Id: <3E1B9958-0B40-4490-9A54-A5F296A6DCF3@mnot.net>
On 27/09/2009, at 9:37 AM, Maciej Stachowiak wrote:

>
> On Sep 26, 2009, at 3:04 PM, Mark Nottingham wrote:
>
>> I agree with Roy, and would add that this document hides the new  
>> information (i.e., how to get from random bits to a valid URI or  
>> IRI) too deeply; for example, if HTTPbis wanted to reference this  
>> thing, it would need to do so by specifying a section in the IRI  
>> spec, even though HTTP doesn't use IRIs at all.
>>
>> What I'd like to see is:
>> a. A revision of the IRI spec (if necessary), and
>> b. A new spec defining how to get from random bits to a URI or an  
>> IRI (allowing the application to choose which one it needs to end  
>> up with).
>>
>> Then, different specs can refer to URIs if they want to, IRIs if  
>> they want to, and optionally specify this processing as a step  
>> beforehand, and do so clearly.
>
> It seems like the only difference between your proposal and Larry's  
> is whether strict processing of IRIs and lenient processing of  
> strings that may or may not be valid IRIs are in the same spec or  
> two separate specs.
>
> I don't see a great advantage in splitting the specs, as this makes  
> cross-references more complicated.

I don't have a lie-down-in-the-road issue with structuring these as  
one document, although I do think it's more natural to separate them.  
What I want to avoid is having this extra step hidden away in a non- 
obvious place that's difficult to reference and specify externally; as  
it currently sits, the processing is specified in an informally named  
section of the IRI spec, which is the last place I'd look for it if I  
were working with URIs.

So, at a minimum, the section needs to be re-cast as something more  
prominent and normative (i.e., if someone chooses to conform to it,  
they should be able to know what that means), and the spec needs to be  
named to reflect that.

> If it were just a matter of transforming the kind of string that may  
> appear in an "href" attribute into a valid IRI, then your proposal  
> might be plausible. However, in addition to converting to a URI,  
> HTML UAs also need to be able to do the following to resource  
> identifiers treated with lenient processing: (a) separate into  
> components, even when the string is not a valid URI or IRI, and in a  
> way that is not necessarily equivalent to first converting to a  
> valid IRI or URI; (b) resolve a reference relative to a base when  
> either the reference or the base might not be a valid URI or IRI;  
> (c) determine if a reference is "absolute" even if it might not be a  
> valid URI or IRI. That would mean a great deal of algorithms defined  
> in a totally separate place from the URI spec. This is what the Web  
> Address spec[1] attempted to do, and it ends up duplicating a lot of  
> concepts from IRI/URI. This effort was set aside in favor of IRIbis  
> incorporating the necessary content.
>
> I do agree that the lenient processing rules are hidden too deeply  
> in Larry's current draft, but I do not think this is intrinsic to  
> the content being in a single document.
>
> Note: it's not clear to me why HTTPbis would want to reference  
> lenient processing rules for URIs/IRIs. Are HTTP servers and proxies  
> not strict in what they accept?

It's been discussed for the Location header. No decision as of yet,  
though.

If something like Location (i.e., something that needs a URI, not an  
IRI, as output) needs this algorithm, including this in the IRI spec  
is going to make things more complex.

>
> Regards,
> Maciej
>
> [1] http://www.w3.org/html/wg/href/draft.html
>
>>
>>
>> On 26/09/2009, at 5:19 AM, Roy T. Fielding wrote:
>>
>>> Larry, your changes to the IRI draft make it incomprehensible.
>>>
>>> I think this is getting ridiculous.  We don't need a working group.
>>> We don't even need an updated draft, at least not for LEIRI, Href,
>>> and whatever it is that we call HTML5 references.
>>>
>>> HTML5 wants to specify the *process* of taking arbitrary data entry
>>> in various places and transforming it into a) something the browser
>>> displays, and b) a URI for use on the wire.  What they are calling  
>>> URL
>>> is the arbitrary data entry part, NOT the resulting URI, which is  
>>> why
>>> it is so frigging annoying and inconsistent with all other  
>>> standards.
>>> LEIRI made the same mistake.
>>>
>>> The purpose of IRI is to specify the allowed syntax for what one
>>> might see on the side of a bus as a Web address in i18n-friendly,
>>> human-readable form.  That is why the IRI syntax does not allow
>>> common delimiters like whitespace, quotes, and brackets (except
>>> for IPv6 literals).  It does not define a data-entry box.
>>>
>>> URI is in the same boat, except that it also defines the allowed
>>> syntax for on-the-wire usage in HTTP, etc.  It is intentionally
>>> limited for use in embedded plain text.  It does not define a
>>> data entry box.
>>>
>>> Both IRI and URI are intended to define standards for the Internet
>>> in the same way as the US Postal Service residential addresses have
>>> a standard normal form.  The fact that an envelope does not prevent
>>> a person from writing an arbitrary form of address in the hope that
>>> a mail carrier can interpret it for them is not an indication that
>>> the standard is somehow "wrong" -- what matters is that following
>>> the standard is known to be interoperable, and everything else is
>>> just an experiment in forgiveness.
>>>
>>> What HTML5 wants to define is how to process a data entry box
>>> in the same way across all browser implementations, and there is
>>> nothing wrong with such a definition appearing in HTML5 *except*
>>> for the fact that the editor has chosen an existing well-known
>>> term that means something else to describe it, which conflicts
>>> with all prior uses of that term.  Just stop that nonsense by
>>> changing the HTML5 draft wording to talk about references, not URLs.
>>> HTML5 does not require changes to IRI, and certainly not to URI.
>>>
>>> Changing IRI (or URI) so that it conforms both to the side of a
>>> bus definition and a data entry definition is insane.  They are
>>> not the same thing.  They do not share the same concerns.  A
>>> reference might allow anything, depending on its context and the
>>> technology used to parse it; it is the post-processing that
>>> produces an IRI/URI.
>>>
>>> ....Roy
>>>
>>
>>
>> --
>> Mark Nottingham     http://www.mnot.net/
>>
>>
>


--
Mark Nottingham     http://www.mnot.net/
Received on Sunday, 27 September 2009 00:02:08 UTC