Re: Advice on making IRI document suitable for reference by HTML (and other specs)

On Dec 28, 2009, at 12:04 PM, Larry Masinter wrote:
> Larry:
>>> not just the ones in browsers. Things like chopping off initial & final
>>> whitespace, hadling single "%" , deleting or encoding otherwise illegal
>>> characters, etc.
>>> ...
> 
> Julian:
>> Would that affect the definition of a syntactically legal IRI?
> 
> Yes; that's the point, isn't it? Bring "syntactically legal IRI" into
> alignment with widespread, popular implementations. The implementations
> in Windows, OS X, Firefox, WebKit, etc. are widespread and popular.
> 
> Doing so will certainly involve, for the most part, changing the
> definition of "syntactically legal IRIs". There may be a few issues where
> some "widespread, popular implementations" may also need change
> (if implementations exhibit different behavior, to bring them 
> into alignment).

This is still confusing IRIs with the arbitrary contents of an
href (or other) attribute.

The fact is that HTML5 (and others) needs a definition of reference
and the rules for converting a reference to an IRI or URI.
Trying to pretend that a reference is always an IRI is doomed
to fail -- you might as well obsolete the RFC and say that
an IRI is anyString.

>> The reason why I'm asking is that there are specifications that rely on 
>> the fact that the space character can't be part of a legal IRI (or URI), 
>> and thus can be used as delimiter (the same probably applies to other 
>> kinds of whitespace).
> 
> There are a number of legacy specifications which refer to other
> forms (e.g., LEIRI). The path I think we should follow is to leave
> "URI" completely alone (long-standing, stable, STANDARD) but let 
> the term "IRI" expand to encompass as much as feasible, and then
> handle other variations as syntactic restrictions.  I think that's
> preferable to continuing to define "IRI" narrowly and then treating
> variations as syntactic expansions. 

Huh?  We currently define IRI to be a specific interoperable
form of identifier that uses UTF-8 instead of ASCII but remains
easily convertible to URI form.

That has nothing to do with the arbitrary anyURI content of
reference attributes because those attributes are not, and
never have been, restricted to URI or IRI forms.  The only
thing restricted to URI or IRI is the result of parsing the
reference and transforming it to an absolute form.

> draft-deurst-iri-bis-07 doesn't go this far -- it mainly leaves
> IRI alone and treats "popular browser implementations" as a syntactic 
> expansion which is handled by pre-processing. 
> 
> To fix this, we would "take some of the things in section 
> 7.2 "HREF preprocessing" and move them into the main body
> of what all normative URI[IRI] processors should do".

Thus making all current references to the standard wrong
and useless.  Julian is right.  What you should be doing
is defining an algorithm from anyString to the current
definition of IRI, and then change HTML5 so that it uses
anyString (or whatever you want to call it) as the attribute
definition.  My suggested name is "Web reference".  Just be
aware that some HTML5 attributes require a list of
space-separated references, whereas others require a
single reference that expects space to be auto-encoded
by the parser.

....Roy

Received on Monday, 28 December 2009 20:27:15 UTC