- From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
- Date: Mon, 03 Oct 2005 13:24:47 +0100
- To: www-international@w3.org
Hello
I had a support question for the Jena Semantic Web software, concerning
the following RDF URI Reference:
http://ontology.tos.co.jp/#\u304A\u3082\u3061\u3083\uFF06\u30DB\u30D3\u30FC
where the \u escapes denote the unicode characters.
The initial problem was that this was input with the rdf:ID syntax, and
that "\u304A\u3082\u3061\u3083\uFF06\u30DB\u30D3\u30FC" is not an XML
Name because of the half-width ampersand "\uFF06", which I note is a
compatibility character.
The XML recommendation says:
[[
Characters in the compatibility area (i.e. with character code greater
than #xF900 and less than #xFFFE) are not allowed in XML names.
]]
On further reading, I saw in RFC 3987 that:
http://www.ietf.org/rfc/rfc3987.txt
[[
On the other hand, in some cases, the UCS contains
variants for compatibility reasons; for example, for typographic
purposes. These should be avoided wherever possible. Although there
may be exceptions, newly created resource names should generally be
in NFKC [UTR15]
]]
While not being familiar with the concept of NFKC, I believe this means
that compatibility characters should be avoided when creating a new IRI.
Since the document was creating this IRI, I advised that it should be
changed (e.g. by deleting the half-width ampersand)
Presumably a different change would be to use a normal ampersand "&",
which is legal in an IRI fragment, and not one to avoid when creating a
new IRI. (Although illegal in an XML Name, for which there is a work-around)
Have I understood correctly?
Jeremy
Received on Monday, 3 October 2005 12:25:11 UTC