- From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
- Date: Mon, 03 Oct 2005 13:24:47 +0100
- To: www-international@w3.org
Hello I had a support question for the Jena Semantic Web software, concerning the following RDF URI Reference: http://ontology.tos.co.jp/#\u304A\u3082\u3061\u3083\uFF06\u30DB\u30D3\u30FC where the \u escapes denote the unicode characters. The initial problem was that this was input with the rdf:ID syntax, and that "\u304A\u3082\u3061\u3083\uFF06\u30DB\u30D3\u30FC" is not an XML Name because of the half-width ampersand "\uFF06", which I note is a compatibility character. The XML recommendation says: [[ Characters in the compatibility area (i.e. with character code greater than #xF900 and less than #xFFFE) are not allowed in XML names. ]] On further reading, I saw in RFC 3987 that: http://www.ietf.org/rfc/rfc3987.txt [[ On the other hand, in some cases, the UCS contains variants for compatibility reasons; for example, for typographic purposes. These should be avoided wherever possible. Although there may be exceptions, newly created resource names should generally be in NFKC [UTR15] ]] While not being familiar with the concept of NFKC, I believe this means that compatibility characters should be avoided when creating a new IRI. Since the document was creating this IRI, I advised that it should be changed (e.g. by deleting the half-width ampersand) Presumably a different change would be to use a normal ampersand "&", which is legal in an IRI fragment, and not one to avoid when creating a new IRI. (Although illegal in an XML Name, for which there is a work-around) Have I understood correctly? Jeremy
Received on Monday, 3 October 2005 12:25:11 UTC