Re: XML Core -> I18n Core: IRIs as namespace names? from Martin Duerst on 2008-08-22 (www-international@w3.org from July to September 2008)

From: Martin Duerst <duerst@it.aoyama.ac.jp>
Date: Fri, 22 Aug 2008 16:57:58 +0900
To: John Cowan <cowan@ccil.org>, www-international@w3.org
Message-Id: <6.0.0.20.2.20080821195550.03e9ee38@localhost>

At 08:18 08/08/14, John Cowan wrote:

>However, we are considering backporting features of XML Namespaces 1.1
>(which is used exclusively with XML 1.1 documents) to XML Namespaces 1.0
>(which is used exclusively with XML 1.0 documents).  The relevant feature
>is allowing XML namespace names to be IRIs rather than URIs.
>
>Point in favor: allowing an IRI permits the namespace name (which is used
>only for naming, not for retrieval) to be at least partly meaningful in
>languages other than English.

Another point in favor: When I last looked, the majority of XML 1.0 implementations I tested (mostly by using a namespace with non-ASCII
characters in XSLT) just "did the right thing".

For the tests, see http://www.w3.org/2003/02/uriEquivTest/ and
http://lists.w3.org/Archives/Public/www-international/2003JanMar/0025.html.

Of course, this was "years ago".

>Point against: supporting full Unicode allows both visual spoofing and
>composed-vs.-decomposed character spoofing of namespace names, possibly
>causing a document which appears to be in one namespace to be validated
>against the schema for another namespace.  Namespace names are compared
>using codepoint-by-codepoint equality only, and this will not be changed.

We had extensive discussions about similar problems (mostly for
element/attribute names) during some work on the normalization part
of the character model.

I think the schema validation case isn't terribly serious.
The way I understand it, a recipient will be validating against
a known schema, and if the sender assumed a (normalization-wise)
different one, then there will be an error, and that error will
in due time be corrected.

When we thought about it, we came up mainly with some cases of
e.g. some XSLT application selecting e.g. the 7th occurrence of
an element 'foo' for some payment amount, and somebody trying
to fool a human into thinking that an element with a differently
normalized name was the 7th while the processor would pick what
would look to the user as the 8th (or some such). Not totally
impossible, but rather far-fetched. It would be possible with
namespaces, too, but only if two separate namespaces are used,
which might already raise suspicion.

Come to think about it, similar tricks are already possible
by using two prefixes differing only in normalization,
because namespace prefixes already allow Unicode
(http://www.w3.org/TR/2006/REC-xml-names-20060816/#NT-Prefix).

The people using namespaces (as opposed to the people using
domain names in web addresses and email addresses) are few and
far between, in general with a certain technical expertise.

>What do you think?  Should we allow IRIs?

Yes, very much so.

My guess is that the number of usages won't be that high,
but there might be some interesting use cases e.g. in the
RDF area or in education in particular.

Also the cost of allowing it (treating namespace IRIs similar
to any other XML data) is actually lower than the cost of
not allowing it (special-casing against non-ASCII).

Regards,     Martin.

#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp

Received on Friday, 22 August 2008 08:00:14 UTC