W3C home > Mailing lists > Public > www-international@w3.org > October to December 2005

Re: IRI with compatibility character, unwise?

From: Martin Duerst <duerst@it.aoyama.ac.jp>
Date: Tue, 04 Oct 2005 10:48:11 +0900
Message-Id: <6.0.0.20.2.20051004104042.07d40b50@localhost>
To: Jeremy Carroll <jjc@hplb.hpl.hp.com>, www-international@w3.org

Hello Jeremy,

At 21:24 05/10/03, Jeremy Carroll wrote:
 >
 >
 >Hello
 >
 >I had a support question for the Jena Semantic Web software, concerning 
the following RDF URI Reference:
 >
 >http://ontology.tos.co.jp/#\u304A\u3082\u3061\u3083\uFF06\u30DB\u30D3\u30FC
 >
 >where the \u escapes denote the unicode characters.
 >
 >The initial problem was that this was input with the rdf:ID syntax, and 
that "\u304A\u3082\u3061\u3083\uFF06\u30DB\u30D3\u30FC" is not an XML Name 
because of the half-width ampersand "\uFF06", which I note is a 
compatibility character.

Just a detail: U+FF06 is the FULL WIDTH ampersand, not the half-width
ampersand (which is part of ASCII).

 >The XML recommendation says:
 >[[
 >Characters in the compatibility area (i.e. with character code greater 
than #xF900 and less than #xFFFE) are not allowed in XML names.
 >]]
 >
 >On further reading, I saw in RFC 3987 that:
 >
 >http://www.ietf.org/rfc/rfc3987.txt
 >[[
 >On the other hand, in some cases, the UCS contains
 >    variants for compatibility reasons; for example, for typographic
 >    purposes.  These should be avoided wherever possible.  Although there
 >    may be exceptions, newly created resource names should generally be
 >    in NFKC [UTR15]
 >]]
 >While not being familiar with the concept of NFKC, I believe this means 
that compatibility characters should be avoided when creating a new IRI.

There is a difference between compatibility characters and
characters in the compatibility area: There are some characters
outside the compatibility area that are compatibility characters.
Some compatibility characters are already folded away by NFC,
NFKC does (most of?) the rest.

 >Since the document was creating this IRI, I advised that it should be 
changed (e.g. by deleting the half-width ampersand)
 >
 >Presumably a different change would be to use a normal ampersand "&", 
which is legal in an IRI fragment, and not one to avoid when creating a new 
IRI. (Although illegal in an XML Name, for which there is a work-around)
 >
 >Have I understood correctly?

Yes, from an IRI point, both solutions are possible.
Although the "&" is legal in a fragment, it may best be
avoided because other kinds of identifiers (starting with
XML Names as you mention) don't allow it.

Is my understanding correct that your response to this support
question is only an advice to the document creator, not a
change to Jena?

Regards,   Martin. 
Received on Tuesday, 4 October 2005 02:51:52 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:06 GMT