W3C home > Mailing lists > Public > xml-names-editor@w3.org > September 2002

IRI definition and repertoire in namespaces 1.1

From: Martin Duerst <duerst@w3.org>
Date: Sat, 28 Sep 2002 16:12:13 +0900
Message-Id: <4.2.0.58.J.20020928160657.03f25f10@localhost>
To: xml-names-editor@w3.org
Cc: w3c-i18n-ig@w3.org

Dear XML-names-editor,

I'm writing to you based on an action item from this week's
I18N WG teleconference.

The I18N WG has looked at XML Namespaces 1.1, currently in Last Call.
(http://www.w3.org/TR/2002/WD-xml-names11-20020905)
My comments below form one part of our last call comments, and
complement the other part, sent to you by Misha Wolf.

We are pleased to see that you consistently use IRIs for
namespace names.

However, we are concerned about your definition of IRIs
in section 7 of your document:

 >>>>
7 Internationalized Resource Identifiers (IRIs)

Some characters are disallowed in URI references, even if they are allowed
in XML; the disallowed characters, according to [RFC2396] and [RFC2732],
are the control characters #x0 to #x1F and #x7F, space #x20, the delimiters
'<' #x3C, '>' #x3E and '"' #x22, the unwise characters '{' #x7B, '}' #x7D,
'|' #x7C, '\' #x5C, '^' #x5E and '`' #x60, as well as all characters above
#x7F.

[Definition: An IRI reference is a string that can be converted to a URI 
reference by escaping all disallowed characters as follows: ]

    1. Each disallowed character is converted to UTF-8 [Unicode 3.2]
       as one or more bytes.
    2. The resulting bytes are escaped with the URI escaping mechanism
       (that is, converted to %HH, where HH is the hexadecimal notation
        of the byte value).
    3. The original character is replaced by the resulting character sequence.
 >>>>

We have some general concerns and some specific concerns.

The general concerns are that this definition may be taken by some
readers as the definition of IRIs as such, and that there may be
conflicts with the I18N WG's work on a full definition of IRIs.
The current version of this is published as
draft-duerst-iri-duerst-01.txt, and the current working copy
can be found from http://www.w3.org/International/iri-edit/.
We would very much apreciate you to review the current draft,
in particular section 2.3.


To some extent, this is our fault, because draft-duerst-iri-xx.txt
is progressing less quickly than we would hope. However, we hope that
we can conclude this work soon.

To solve this problem, I suggest that you add a sentence such as
"please note that the definition in this section is given only for
the purpose of this document; for a more general definition and
discussion of IRIs see [IRI] (work in progress). Once [IRI] becomes
an RFC, we expect to replace this section with a pointer to it."


Another general concern is that your definition is not motivated at
all. It would be good to add something like "This definition serves
to give syntactic restrictions to IRIs (i.e. a string not convertable
to an URI as described below is not an IRI) and to define how to
resolve an IRI (by first converting it to a URI) in case this is
desirable."


More specific concerns:

- The extent of the definition: The definition extends not only
   to the three conversion steps, but also includes the definition
   of 'disallowed characters' above. The position of '[' and ']'
   don't reflect this at all.

- There are some details about disallowed characters that have to
   be checked and corrected:

 >>>>
Some characters are disallowed in URI references, even if they are allowed 
in XML; the disallowed characters, according to [RFC2396] and [RFC2732], 
are the control characters #x0 to #x1F and #x7F, space #x20, the delimiters 
'<' #x3C, '>' #x3E and '"' #x22, the unwise characters '{' #x7B, '}' #x7D, 
'|' #x7C, '\' #x5C, '^' #x5E and '`' #x60, as well as all characters above 
#x7F.
 >>>>

1) The control characters #x0 to #x1F are (with very few exceptions)
    not allowed in XML, and they are also not currently allowed in IRIs.
    (see http://www.w3.org/International/iri-edit/draft-duerst-iri.html#abnf,
    ichar = << allowed character of the UCS [ISO10646] >> | space | delims 
| unwise
    and the following list giving details on
    << allowed character of the UCS [ISO10646] >>)

2) The control characters in the range #x80-#x9f, although allowed in
    XML, are again not allowed in IRIs.

Please change your definition (preferred), or convince the authors of
draft-duerst-iri-xx.txt to change theirs.



Regards,    Martin.
Received on Saturday, 28 September 2002 05:43:22 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 23:13:27 UTC