IRI with compatibility character, unwise? from Jeremy Carroll on 2005-10-03 (www-international@w3.org from October to December 2005)

From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
Date: Mon, 03 Oct 2005 13:24:47 +0100
To: www-international@w3.org
Message-ID: <4341230F.4030907@hplb.hpl.hp.com>

Hello

I had a support question for the Jena Semantic Web software, concerning 
the following RDF URI Reference:

http://ontology.tos.co.jp/#\u304A\u3082\u3061\u3083\uFF06\u30DB\u30D3\u30FC

where the \u escapes denote the unicode characters.

The initial problem was that this was input with the rdf:ID syntax, and 
that "\u304A\u3082\u3061\u3083\uFF06\u30DB\u30D3\u30FC" is not an XML 
Name because of the half-width ampersand "\uFF06", which I note is a 
compatibility character.

The XML recommendation says:
[[
Characters in the compatibility area (i.e. with character code greater 
than #xF900 and less than #xFFFE) are not allowed in XML names.
]]

On further reading, I saw in RFC 3987 that:

http://www.ietf.org/rfc/rfc3987.txt
[[
On the other hand, in some cases, the UCS contains
    variants for compatibility reasons; for example, for typographic
    purposes.  These should be avoided wherever possible.  Although there
    may be exceptions, newly created resource names should generally be
    in NFKC [UTR15]
]]
While not being familiar with the concept of NFKC, I believe this means 
that compatibility characters should be avoided when creating a new IRI.
Since the document was creating this IRI, I advised that it should be 
changed (e.g. by deleting the half-width ampersand)

Presumably a different change would be to use a normal ampersand "&", 
which is legal in an IRI fragment, and not one to avoid when creating a 
new IRI. (Although illegal in an XML Name, for which there is a work-around)

Have I understood correctly?

Jeremy

Received on Monday, 3 October 2005 12:25:11 UTC