- From: Roy T. Fielding <fielding@gbiv.com>
- Date: Mon, 28 Dec 2009 12:26:43 -0800
- To: Larry Masinter <masinter@Adobe.COM>
- Cc: "julian.reschke@gmx.de" <julian.reschke@gmx.de>, "public-iri@w3.org" <public-iri@w3.org>
On Dec 28, 2009, at 12:04 PM, Larry Masinter wrote: > Larry: >>> not just the ones in browsers. Things like chopping off initial & final >>> whitespace, hadling single "%" , deleting or encoding otherwise illegal >>> characters, etc. >>> ... > > Julian: >> Would that affect the definition of a syntactically legal IRI? > > Yes; that's the point, isn't it? Bring "syntactically legal IRI" into > alignment with widespread, popular implementations. The implementations > in Windows, OS X, Firefox, WebKit, etc. are widespread and popular. > > Doing so will certainly involve, for the most part, changing the > definition of "syntactically legal IRIs". There may be a few issues where > some "widespread, popular implementations" may also need change > (if implementations exhibit different behavior, to bring them > into alignment). This is still confusing IRIs with the arbitrary contents of an href (or other) attribute. The fact is that HTML5 (and others) needs a definition of reference and the rules for converting a reference to an IRI or URI. Trying to pretend that a reference is always an IRI is doomed to fail -- you might as well obsolete the RFC and say that an IRI is anyString. >> The reason why I'm asking is that there are specifications that rely on >> the fact that the space character can't be part of a legal IRI (or URI), >> and thus can be used as delimiter (the same probably applies to other >> kinds of whitespace). > > There are a number of legacy specifications which refer to other > forms (e.g., LEIRI). The path I think we should follow is to leave > "URI" completely alone (long-standing, stable, STANDARD) but let > the term "IRI" expand to encompass as much as feasible, and then > handle other variations as syntactic restrictions. I think that's > preferable to continuing to define "IRI" narrowly and then treating > variations as syntactic expansions. Huh? We currently define IRI to be a specific interoperable form of identifier that uses UTF-8 instead of ASCII but remains easily convertible to URI form. That has nothing to do with the arbitrary anyURI content of reference attributes because those attributes are not, and never have been, restricted to URI or IRI forms. The only thing restricted to URI or IRI is the result of parsing the reference and transforming it to an absolute form. > draft-deurst-iri-bis-07 doesn't go this far -- it mainly leaves > IRI alone and treats "popular browser implementations" as a syntactic > expansion which is handled by pre-processing. > > To fix this, we would "take some of the things in section > 7.2 "HREF preprocessing" and move them into the main body > of what all normative URI[IRI] processors should do". Thus making all current references to the standard wrong and useless. Julian is right. What you should be doing is defining an algorithm from anyString to the current definition of IRI, and then change HTML5 so that it uses anyString (or whatever you want to call it) as the attribute definition. My suggested name is "Web reference". Just be aware that some HTML5 attributes require a list of space-separated references, whereas others require a single reference that expects space to be auto-encoded by the parser. ....Roy
Received on Monday, 28 December 2009 20:27:15 UTC