W3C home > Mailing lists > Public > www-international@w3.org > October to December 2005

IRI with compatibility character, unwise?

From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
Date: Mon, 03 Oct 2005 13:24:47 +0100
Message-ID: <4341230F.4030907@hplb.hpl.hp.com>
To: www-international@w3.org


I had a support question for the Jena Semantic Web software, concerning 
the following RDF URI Reference:


where the \u escapes denote the unicode characters.

The initial problem was that this was input with the rdf:ID syntax, and 
that "\u304A\u3082\u3061\u3083\uFF06\u30DB\u30D3\u30FC" is not an XML 
Name because of the half-width ampersand "\uFF06", which I note is a 
compatibility character.

The XML recommendation says:
Characters in the compatibility area (i.e. with character code greater 
than #xF900 and less than #xFFFE) are not allowed in XML names.

On further reading, I saw in RFC 3987 that:

On the other hand, in some cases, the UCS contains
    variants for compatibility reasons; for example, for typographic
    purposes.  These should be avoided wherever possible.  Although there
    may be exceptions, newly created resource names should generally be
    in NFKC [UTR15]
While not being familiar with the concept of NFKC, I believe this means 
that compatibility characters should be avoided when creating a new IRI.
Since the document was creating this IRI, I advised that it should be 
changed (e.g. by deleting the half-width ampersand)

Presumably a different change would be to use a normal ampersand "&", 
which is legal in an IRI fragment, and not one to avoid when creating a 
new IRI. (Although illegal in an XML Name, for which there is a work-around)

Have I understood correctly?

Received on Monday, 3 October 2005 12:25:11 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 21 September 2016 22:37:25 UTC