- From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
- Date: Fri, 01 May 2009 16:19:21 +0900
- To: "Henry S. Thompson" <ht@inf.ed.ac.uk>
- CC: www-tag@w3.org, public-iri@w3.org, Apps Discuss <discuss@apps.ietf.org>, Lisa Dusseault <lisa.dusseault@messagingarchitects.com>, Alexey Melnikov <alexey.melnikov@isode.com>
Hello Henry, Many thanks for this very good overview. I'm cross-posting this to the IRI list (public-iri@w3.org) because Lisa at one point proposed to have this kind of discussion there, as well as to the Apps Discuss list (discuss@apps.ietf.org) to reach out to the relevant people in the IETF. I have also copied Lisa and Alex directly. I guess this is overall a bit too agressive of a cross-posting (but please tell me if you think I have missed somebody important). However, I hope we can converge quickly on where to move forward with what bits of the discussion/work. On 2009/04/29 0:31, Henry S. Thompson wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > There are currently five documents in this space (that I am aware of): > > [URI] The current RFC governing URIs: > http://tools.ietf.org/html/rfc3986 > > [IRI] The current RFC governing IRIs: > http://tools.ietf.org/html/rfc3987 > > [IRI-BIS] The most recent draft of a planned update for the RFC > governing IRIs: > http://tools.ietf.org/html/draft-duerst-iri-bis-04 > > [LEIRI] A W3C Note defining Legacy Extended IRIs (extracted from [IRI-BIS]): > http://www.w3.org/TR/leiri/ > > [WEBADDR] A preliminary draft of a possible RFC for Web Addresses > (extracted from HTML5 [1]): > http://www.w3.org/html/wg/href/draft.html [not yet in RFC format, > converted version expected > RSN] > > On the TAG telcon of 2009-04-17, there was some sense that this is too > many specs in the same space. . . Agreed. > In order to contextualize and perhaps stimulate a possible effort to > seek a rationalization here, here's _my_ understanding of how we got > here. > > [URI] is the mature stage of a spec. which has been revised a number > of times. It carries a certain amount of historical baggage with > it, particularly its restriction to 7-bit characters, but that also > ensures wide interoperability and preserves access to legacy > applications. > > [IRI] was intended to address the needs of the expanding Internet > and Web community, allowing most of Unicode into most parts of IRIs. > Rather than require upgrades in a wide range of applications and > uses, it did not set up IRIs as a _replacement_ for URIs across the > board, but as a _complement_ to URIs. It therefore included an > explicit trancoding algorithm, for converting IRIs to URIs. > > [IRI-BIS] was initiated by the editors of [IRI] to correct several > errata to [IRI] and to address the exclusion from [IRI] of certain > characters and character ranges. Yes. In that sense, it is not a separate document from [IRI], just an update. That's very usual for IETF work, a bit less for W3C work if one looks only at Recommendations. So the effective number of documents is down from five to four. > [LEIRI] had its origins in the XML family of W3C specifications. > The XML specification itself [2], as well as a number of other > XML-related specifications (including XML Base, XML Schema, XPointer > Framework, XML Signature) all involve appeal to a process for > converting arbitrary strings which are intended to identify web > resources into URIs. They all incorporate more-or-less identical > prose excerpted from the XLink specification [3] which specifies how > this is to be done. > > The XML Core WG has long been unhappy with this state of affairs, > and the impending release of new editions of several of these specs > encouraged the WG to try to establish a single normative reference > for the concept of a string for identifying web resources in XML > documents and a process for converting them to URIs, which > acknowledged and built on the IRI specification. > > After drafting a document to serve this purpose, discussion with the > editors of [IRI-BIS] convinced all concerned that since a new > version of the IRI spec was already in progress, the best thing to > do, to respect precedent and to avoid unnecessary proliferation, was > to include the relevant definitions in [IRI-BIS], and in fact that > has been done [4]. Once it became apparent, however, that the > progress of [IRI-BIS] to Draft Standard status was likely to be > considerably delayed for reasons outside its editors' control, the > Core WG, with the agreement and co-operation of the editors of > [IRI-BIS], published [LEIRI] as a Working Group Note, so that the > re-issue of new editions of the relevant XML-familty specs could go > ahead. The intention is to issue a revision of [LEIRI] replacing > its contents with a reference to [IRI-BIS] as soon as [IRI-BIS] > becomes a Draft Standard. Yes. If that works out as described above (which I very much hope it will), then [LEIRI] will silently disappear. This would reduce the number of documents from four down to three. For the IETF side, I'd like to give a bit more background on LEIRIs. (see http://tools.ietf.org/html/draft-duerst-iri-bis-05#section-7) The main thing it does is to allow ASCII characters not allowed in URIs (and therefore not allowed in IRIs) back into what's otherwise essentially IRIs. The reason for why a number of XML specs (as listed above) differ from the IRI spec is that these specs adopted a very early and simple definition of IRIs (before the name IRI every existed). Later, when the IRI spec got tightened, these specs didn't want to follow this tightening because, while XML is very strict in what it accepts and what not, it doesn't want to retract promises once given. Another reason is that in XML, context and escaping conventions allow to include essentially any character, whereas for URIs and IRIs in general, this is not the case. > [WEBADDR] had in some ways a similar origin to [LEIRI], starting out > as a section of the HTML5 spec which addressed the process by which > existing browsers process strings to produce URIs which can be > dereferenced. Yes indeed. It changes a space to %20, the same as for LEIRIs. > It differs from [LEIRI] in the exact set of > characters which it escapes, Has anybody done an analysis? It seems to provide more detail about '[' and ']', escaping them depending on context. It could be that that's also necessary for LEIRIs. But "any occurrences of percent-encoding in the Web address will be double-encoded at this step." looks extremely scary. > and in the special handling it mandates > for the encoding of characters in the 'query' part of a URI. See more about that in my reply to Dan. > I am sure that the above summaries can be improved. In particular it > would be helpful have clear statements from their respective > authors/owners as to what the _requirements_ for the three new > documents ([IRI-BIS], [LEIRI] and [WEBADDR]) are. Only after we have > those would it make sense to turn to the question of whether we can > merge some or all of them. Okay, I'm usually not good at requirements, but for [IRI-BIS], they might look about as follows: - Be usable in general, not just in a specific context (such as XML, HTML,...) - Move to Draft Standard (or, if that turns out to not be possible, make sure we can do so on the next round. - Try to avoid fragmentation (terms such as "Human Readable Resource Identifiers" or "Web Addresses" or so can lead to quite a bit of confusion when the main goal is to deal with occasional legacy data that nobody should have produced anyway) - Include the (currently ongoing) update of IDNA (in particular affects section 4, Bidi, and references); that's what's currently holding back progress. Are these the things that you looked for when you said 'Requirements'? Or something else? Regards, Martin. > ht > > [1] http://dev.w3.org/html5/spec/Overview.html#urls > [2] http://www.w3.org/TR/xml/#dt-sysid > [3] http://www.w3.org/TR/2001/REC-xlink-20010627/#link-locators > [4] http://tools.ietf.org/html/draft-duerst-iri-bis-04#section-7 > - -- > Henry S. Thompson, School of Informatics, University of Edinburgh > Half-time member of W3C Team > 10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440 > Fax: (44) 131 651-1426, e-mail: ht@inf.ed.ac.uk > URL: http://www.ltg.ed.ac.uk/~ht/ > [mail really from me _always_ has this .sig -- mail without it is forged spam] > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.2.6 (GNU/Linux) > > iD8DBQFJ9yFQkjnJixAXWBoRAq5tAJwMb/0jpU6XwLbYNqyt2s4uNwTcQACdHx4B > F/J04oFFOeDHZLTT9Y0qkT0= > =f6+L > -----END PGP SIGNATURE----- > > -- #-# Martin J. Dürst, Professor, Aoyama Gakuin University #-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp
Received on Friday, 1 May 2009 07:20:17 UTC