RE: Advice on making IRI document suitable for reference by HTML (and other specs)

(ref http://www.w3.org/Bugs/Public/show_bug.cgi?id=8207 )

Note:
 http://lists.w3.org/Archives/Public/public-html/2009Nov/att-0670/iri-rewrite-draft.html

contains a proposed rewrite of the HTML 5 specification's section 2.5.1,
which would remove all references to the previous "WEBADDRESSES" 
specification (http://www.w3.org/html/wg/href/draft) and use
draft-duerst-iri-bis-07 as the normative reference instead.

The draft text includes several definitions which may belong in the
IRI document itself (including how to resolve an arbitrary string
against an absolute base), which may be necessary if there are
other specifications which were planning to use [WEBADDRESSES]
as a normative reference.

Larry:
>> I'd appreciate it if some other mailing list subscribers had some
>> ideas for how to fix the document better to accomplish (1) while retaining
>> the goal for (2).  To make progress on (2), I think we'd want to take
>> some of the things in section 7.2 "HREF preprocessing" and move them
>> into the main body of what all normative URI processors should do, and

Julian:
> URI processors or IRI processors?

To be careful ,"new IRI processors". One of the problems we have
discussing this topic is that if we are considering changing
the meaning of "IRI processor", that of course some previously
conforming processors will become non-conforming.

Larry:
>> not just the ones in browsers. Things like chopping off initial & final
>> whitespace, hadling single "%" , deleting or encoding otherwise illegal
>> characters, etc.
>> ...

Julian:
> Would that affect the definition of a syntactically legal IRI?

Yes; that's the point, isn't it? Bring "syntactically legal IRI" into
alignment with widespread, popular implementations. The implementations
in Windows, OS X, Firefox, WebKit, etc. are widespread and popular.

Doing so will certainly involve, for the most part, changing the
definition of "syntactically legal IRIs". There may be a few issues where
some "widespread, popular implementations" may also need change
(if implementations exhibit different behavior, to bring them 
into alignment).

> The reason why I'm asking is that there are specifications that rely on 
> the fact that the space character can't be part of a legal IRI (or URI), 
> and thus can be used as delimiter (the same probably applies to other 
> kinds of whitespace).

There are a number of legacy specifications which refer to other
forms (e.g., LEIRI). The path I think we should follow is to leave
"URI" completely alone (long-standing, stable, STANDARD) but let 
the term "IRI" expand to encompass as much as feasible, and then
handle other variations as syntactic restrictions.  I think that's
preferable to continuing to define "IRI" narrowly and then treating
variations as syntactic expansions. 

draft-deurst-iri-bis-07 doesn't go this far -- it mainly leaves
IRI alone and treats "popular browser implementations" as a syntactic 
expansion which is handled by pre-processing. 

To fix this, we would "take some of the things in section 
7.2 "HREF preprocessing" and move them into the main body
of what all normative URI[IRI] processors should do".

Larry
--
http://larry.masinter.net

Received on Monday, 28 December 2009 20:05:07 UTC