- From: Martin Duerst <duerst@it.aoyama.ac.jp>
- Date: Tue, 04 Oct 2005 10:48:11 +0900
- To: Jeremy Carroll <jjc@hplb.hpl.hp.com>, www-international@w3.org
Hello Jeremy, At 21:24 05/10/03, Jeremy Carroll wrote: > > >Hello > >I had a support question for the Jena Semantic Web software, concerning the following RDF URI Reference: > >http://ontology.tos.co.jp/#\u304A\u3082\u3061\u3083\uFF06\u30DB\u30D3\u30FC > >where the \u escapes denote the unicode characters. > >The initial problem was that this was input with the rdf:ID syntax, and that "\u304A\u3082\u3061\u3083\uFF06\u30DB\u30D3\u30FC" is not an XML Name because of the half-width ampersand "\uFF06", which I note is a compatibility character. Just a detail: U+FF06 is the FULL WIDTH ampersand, not the half-width ampersand (which is part of ASCII). >The XML recommendation says: >[[ >Characters in the compatibility area (i.e. with character code greater than #xF900 and less than #xFFFE) are not allowed in XML names. >]] > >On further reading, I saw in RFC 3987 that: > >http://www.ietf.org/rfc/rfc3987.txt >[[ >On the other hand, in some cases, the UCS contains > variants for compatibility reasons; for example, for typographic > purposes. These should be avoided wherever possible. Although there > may be exceptions, newly created resource names should generally be > in NFKC [UTR15] >]] >While not being familiar with the concept of NFKC, I believe this means that compatibility characters should be avoided when creating a new IRI. There is a difference between compatibility characters and characters in the compatibility area: There are some characters outside the compatibility area that are compatibility characters. Some compatibility characters are already folded away by NFC, NFKC does (most of?) the rest. >Since the document was creating this IRI, I advised that it should be changed (e.g. by deleting the half-width ampersand) > >Presumably a different change would be to use a normal ampersand "&", which is legal in an IRI fragment, and not one to avoid when creating a new IRI. (Although illegal in an XML Name, for which there is a work-around) > >Have I understood correctly? Yes, from an IRI point, both solutions are possible. Although the "&" is legal in a fragment, it may best be avoided because other kinds of identifiers (starting with XML Names as you mention) don't allow it. Is my understanding correct that your response to this support question is only an advice to the document creator, not a change to Jena? Regards, Martin.
Received on Tuesday, 4 October 2005 02:51:52 UTC