- From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
- Date: Wed, 20 Feb 2002 13:46:49 -0000
- To: "Brian McBride" <bwm@hplb.hpl.hp.com>, "RDF Core" <w3c-rdfcore-wg@w3.org>
- Cc: <w3c-i18n-ig@w3.org>
> rdf-charmod-uris: Does the treatment of uris conform to charmod ? > We need an owner to check this Again I would prefer not to own this. However, we did discuss this a bit in the thread: http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Oct/0330.html My understanding of where we got to was: In RDF URIrefs are used as unique labels. The uniqueness is important and so an important aspect is being able to compare to URIs from different sources and saying that they are the same or not. M&S specifies a URIref as per RFC 2396. I understand that as being (a subset of) US-ASCII only, and hence not charmod conformant. A required change is to permit the full range of unicode characters in URIs wherever a URIref is permitted in the RDF/XML grammar. Such an international URI is subject to a standard algorithm (given in charmod) to convert it into a US-ASCII URI. We need to ensure that: - a URI when given using international characters, and when given using US-ASCII compares equal. This can be done by one of the following three techniques: [A] normalizing URIs on input to US-ASCII [B] normalizing URIs on input to international form [C] using the URI normalization algorithm as part of the URI compare algorithm. [C] looks inefficient and inelegant. [B] doesn't work, because a US-ASCII URI with % escapes in it does not specify the charset used for the encoding, whereas the algorithm the other way assumes UTF-8. Thus I believe [A] is the answer. This normalization should be done in a non-ambiguous way and so I favour specifying that the hexadecimal escape sequences e.g. "%A3" should not use a-f but use A-F instead. This allows binary compare. It also means that: - %hh must be normalized to %HH where hh is a pair of hexadecimal lower case digits. - % is not allowed in the international form of a URI except for introducing hexadecimal escape sequences. A URI that really does contain a % must then be encoded by the original document author as "%25" I can produce some test cases illustrating this, in which RDF/XML documents with mixed usage of international and US ASCII URIs work successfully. I suggest that N-Triple be restricted to US ASCII URIs only. N-Triple is not intended as an end user document format and internationalization concerns do not apply. Jeremy
Received on Wednesday, 20 February 2002 08:47:05 UTC