A new RFC for Web Addresses/Hypertext References: Background wrt LEIRIs from Henry S. Thompson on 2009-04-28 (www-tag@w3.org from April 2009)

From: Henry S. Thompson <ht@inf.ed.ac.uk>
Date: Tue, 28 Apr 2009 16:31:28 +0100
To: www-tag@w3.org
Message-ID: <f5btz48pzfz.fsf@hildegard.inf.ed.ac.uk>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

There are currently five documents in this space (that I am aware of):

 [URI] The current RFC governing URIs:
   http://tools.ietf.org/html/rfc3986

 [IRI] The current RFC governing IRIs:
   http://tools.ietf.org/html/rfc3987

 [IRI-BIS] The most recent draft of a planned update for the RFC
           governing IRIs:
   http://tools.ietf.org/html/draft-duerst-iri-bis-04

 [LEIRI] A W3C Note defining Legacy Extended IRIs (extracted from [IRI-BIS]):
   http://www.w3.org/TR/leiri/

 [WEBADDR] A preliminary draft of a possible RFC for Web Addresses
           (extracted from HTML5 [1]):
   http://www.w3.org/html/wg/href/draft.html [not yet in RFC format,
                                              converted version expected
                                              RSN]

On the TAG telcon of 2009-04-17, there was some sense that this is too
many specs in the same space. . .

In order to contextualize and perhaps stimulate a possible effort to
seek a rationalization here, here's _my_ understanding of how we got
here.

[URI] is the mature stage of a spec. which has been revised a number
  of times.  It carries a certain amount of historical baggage with
  it, particularly its restriction to 7-bit characters, but that also
  ensures wide interoperability and preserves access to legacy
  applications.

[IRI] was intended to address the needs of the expanding Internet
  and Web community, allowing most of Unicode into most parts of IRIs.
  Rather than require upgrades in a wide range of applications and
  uses, it did not set up IRIs as a _replacement_ for URIs across the
  board, but as a _complement_ to URIs.  It therefore included an
  explicit trancoding algorithm, for converting IRIs to URIs.

[IRI-BIS] was initiated by the editors of [IRI] to correct several
  errata to [IRI] and to address the exclusion from [IRI] of certain
  characters and character ranges.

[LEIRI] had its origins in the XML family of W3C specifications.
  The XML specification itself [2], as well as a number of other
  XML-related specifications (including XML Base, XML Schema, XPointer
  Framework, XML Signature) all involve appeal to a process for
  converting arbitrary strings which are intended to identify web
  resources into URIs.  They all incorporate more-or-less identical
  prose excerpted from the XLink specification [3] which specifies how
  this is to be done.

  The XML Core WG has long been unhappy with this state of affairs,
  and the impending release of new editions of several of these specs
  encouraged the WG to try to establish a single normative reference
  for the concept of a string for identifying web resources in XML
  documents and a process for converting them to URIs, which
  acknowledged and built on the IRI specification.

  After drafting a document to serve this purpose, discussion with the
  editors of [IRI-BIS] convinced all concerned that since a new
  version of the IRI spec was already in progress, the best thing to
  do, to respect precedent and to avoid unnecessary proliferation, was
  to include the relevant definitions in [IRI-BIS], and in fact that
  has been done [4].  Once it became apparent, however, that the
  progress of [IRI-BIS] to Draft Standard status was likely to be
  considerably delayed for reasons outside its editors' control, the
  Core WG, with the agreement and co-operation of the editors of
  [IRI-BIS], published [LEIRI] as a Working Group Note, so that the
  re-issue of new editions of the relevant XML-familty specs could go
  ahead.  The intention is to issue a revision of [LEIRI] replacing
  its contents with a reference to [IRI-BIS] as soon as [IRI-BIS]
  becomes a Draft Standard.

[WEBADDR] had in some ways a similar origin to [LEIRI], starting out
  as a section of the HTML5 spec which addressed the process by which
  existing browsers process strings to produce URIs which can be
  dereferenced.  It differs from [LEIRI] in the exact set of
  characters which it escapes, and in the special handling it mandates
  for the encoding of characters in the 'query' part of a URI.

I am sure that the above summaries can be improved.  In particular it
would be helpful have clear statements from their respective
authors/owners as to what the _requirements_ for the three new
documents ([IRI-BIS], [LEIRI] and [WEBADDR]) are.  Only after we have
those would it make sense to turn to the question of whether we can
merge some or all of them.

ht

[1] http://dev.w3.org/html5/spec/Overview.html#urls
[2] http://www.w3.org/TR/xml/#dt-sysid
[3] http://www.w3.org/TR/2001/REC-xlink-20010627/#link-locators
[4] http://tools.ietf.org/html/draft-duerst-iri-bis-04#section-7
- -- 
       Henry S. Thompson, School of Informatics, University of Edinburgh
                         Half-time member of W3C Team
      10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440
                Fax: (44) 131 651-1426, e-mail: ht@inf.ed.ac.uk
                       URL: http://www.ltg.ed.ac.uk/~ht/
[mail really from me _always_ has this .sig -- mail without it is forged spam]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFJ9yFQkjnJixAXWBoRAq5tAJwMb/0jpU6XwLbYNqyt2s4uNwTcQACdHx4B
F/J04oFFOeDHZLTT9Y0qkT0=
=f6+L
-----END PGP SIGNATURE-----
Received on Tuesday, 28 April 2009 15:32:03 UTC