W3C home > Mailing lists > Public > www-international@w3.org > October to December 2002

RE: IRIs everywhere (including XML namespaces)

From: Martin Duerst <duerst@w3.org>
Date: Fri, 01 Nov 2002 16:19:11 +0900
Message-Id: <4.2.0.58.J.20021101160106.0496e680@localhost>
To: "Julian Reschke" <julian.reschke@gmx.de>, "Chris Lilley" <chris@w3.org>, <www-tag@w3.org>, "Julian Reschke" <julian.reschke@gmx.de>
Cc: www-international@w3.org

At 22:07 02/10/30 +0100, Julian Reschke wrote:

>BTW: up until recently, I thought that IRIs are just URIs that allow
>"arbitrary" Unicode for the sake of I18N. Why allow the space character
>then?


[I have copied www-international, because this is the official list
for discussing the IRI draft.]


Originally, we had it as above, same as URIs, but adding in non-ASCII
Unicode characters. The main reason for adding space, 'delims', and
'unwise' characters was that this would be very handy for XPointer.
XPointers easily contain spaces, and some other 'delims' and 'unwise'
characters.

Specifically, the IRI draft currently contains the following text
(section 5.1):

 >>>>>>>>
       b.  In the URI syntax, characters that are likely to be used to
          delimit URIs in text and print ("space", "delims", and
          "unwise") were excluded.  They are included in the IRI syntax
          (with the exception of '%', which cannot be used directly, and
          #, which is used in IRI references), for the following reasons:

             1) The syntax includes many other characters that are not
                appropriate in many cases.

             2) Some implementation practice already allows them in URI
                references (for example spaces in fragment identifiers).

             3) It is very convenient in some cases, for example for
                XPointers in XML attributes.

             4) Considering context is already necessary in the case of
                URIs, for example for "&amp;" in XML.

          However, these characters should be used carefully.  Whenever
          there is a chance that an IRI will be used in a component where
          these characters can be harmful, they should be escaped from
          the start.
 >>>>>>>>

If there is a consensus to change this (either the exact wording, making
it a stronger warning, or the actual characters), it can definitely be
changed.
But if changing the actual characters, this would affect XPointer, as well
XML, XLink, and XML Schema (via XLink).

Also, we better should decide this soon, as it seems we all agree that
it would be good for the IRI draft to become an RFC.

Regards,     Martin.
Received on Friday, 1 November 2002 02:29:34 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:16:59 GMT