- From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
- Date: Mon, 03 Oct 2005 13:24:47 +0100
- To: www-international@w3.org
Hello
I had a support question for the Jena Semantic Web software, concerning 
the following RDF URI Reference:
http://ontology.tos.co.jp/#\u304A\u3082\u3061\u3083\uFF06\u30DB\u30D3\u30FC
where the \u escapes denote the unicode characters.
The initial problem was that this was input with the rdf:ID syntax, and 
that "\u304A\u3082\u3061\u3083\uFF06\u30DB\u30D3\u30FC" is not an XML 
Name because of the half-width ampersand "\uFF06", which I note is a 
compatibility character.
The XML recommendation says:
[[
Characters in the compatibility area (i.e. with character code greater 
than #xF900 and less than #xFFFE) are not allowed in XML names.
]]
On further reading, I saw in RFC 3987 that:
http://www.ietf.org/rfc/rfc3987.txt
[[
On the other hand, in some cases, the UCS contains
    variants for compatibility reasons; for example, for typographic
    purposes.  These should be avoided wherever possible.  Although there
    may be exceptions, newly created resource names should generally be
    in NFKC [UTR15]
]]
While not being familiar with the concept of NFKC, I believe this means 
that compatibility characters should be avoided when creating a new IRI.
Since the document was creating this IRI, I advised that it should be 
changed (e.g. by deleting the half-width ampersand)
Presumably a different change would be to use a normal ampersand "&", 
which is legal in an IRI fragment, and not one to avoid when creating a 
new IRI. (Although illegal in an XML Name, for which there is a work-around)
Have I understood correctly?
Jeremy
Received on Monday, 3 October 2005 12:25:11 UTC