Towards a request to I18N Core wrt LEIRIs from Henry S. Thompson on 2007-08-21 (public-xml-core-wg@w3.org from August 2007)

From: Henry S. Thompson <ht@inf.ed.ac.uk>
Date: Tue, 21 Aug 2007 15:25:50 +0100
To: public-xml-core-wg <public-xml-core-wg@w3.org>
Message-ID: <f5bd4xh55ht.fsf@hildegard.inf.ed.ac.uk>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Futher to 

  ACTION to Henry: Explore, with the expectation of proposing, the
  possibility of asking I18N Core to define "legacy extended IRIs" (by
  whatever name) in the upcoming revision of the IRI RFC.

I have discussed things, and this looks like it's worth a try.

So, I suggest we fill in and send something along the following lines
to the I18N Core WG, but note I do _not_ understand all the subtleties
of the issues involved, so those who do please review/revise this
rigourously.

- ----------

  We would like to suggest that the best way to move forward with our
  effort to reconcile the differences between the way in which various
  specifications in the XML family allow a superset of IRIs, and the
  IRI spec. itself, would be to incorporate a new section in the
  revision of the IRI spec. that you are currently working on, which
  would name and define a single concept to be referenced from all
  those XML specs, along the following lines:

  Name (negotiable): Legacy Extended IRIs (LEIRIs)
  Definition (taken from [1]):

   A Human Readable Resource Identifier (HRRI) is a sequence of
   Unicode characters that can be converted into an IRI by the
   application of a few simple encoding rules.

   To convert a Human Readable Resource Identifier to an IRI
   reference, the following characters MUST be percent encoded:

    * the control characters #x0 to #x1F and #x7F to #x9F
    * space #x20
    * the delimiters "<" #x3C, ">" #x3E, and '"' #x22
    * the unwise characters "{" #x7B, "}" #x7D, "|" #x7C, "\" #x5C,
      "^" #x5E, and "`" #x60
    * characters in the Unicode private use area (#xE000-#xF8FF),
      except where they appear in the query part of the resulting IRI.

   These characters are percent encoded by applying [steps 2.1 to 2.3
   of Section 3.1 of RFC 3987] to them.

  Health Warning: We would be happy to see some text added to warn
   against creating new LEIRIs using most or indeed almost all of the
   characters allowed by this, perhaps expanding on what is already
   present in [1]: "[A]uthors of HRRIs are advised to percent encode
   space characters themselves, rather than rely on the processor to
   do so, because spaces are often used to separate HRRIs in a
   sequence."

  We would expect to go ahead and publish several specs. which are
  waiting for a resolution of this issue, e.g. XML Base 2e and XLink
  1.1, once there is a stable and agreed-final Internet Draft of a new
  edition of 3987 including agreed prose along the lines given above,
  leaving the insertion of the final RFC number to subsequent errata.

- -------------
ht

[1] http://www.w3.org/XML/2007/04/hrri/draft-walsh-tobin-hrri-01c.html
- -- 
 Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
                     Half-time member of W3C Team
    2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
            Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk
                   URL: http://www.ltg.ed.ac.uk/~ht/
[mail really from me _always_ has this .sig -- mail without it is forged spam]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFGyvXukjnJixAXWBoRAhGFAJ9auA2zF64rgOmoe4nPfJXbC7Z4IACdFmz4
pcqQocHua4tHuncsD3XZZrY=
=TfxB
-----END PGP SIGNATURE-----
Received on Tuesday, 21 August 2007 14:25:54 UTC