- From: Francois Yergeau <FYergeau@alis.com>
- Date: Fri, 19 Oct 2001 14:08:37 -0400
- To: w3c-rdfcore-wg@w3.org, w3c-i18n-ig@w3.org
Jeremy Carroll wrote: > My take on the erratum at > http://www.w3.org/XML/xml-V10-2e-errata#E26. is > that RDF needs to specify that for RDF/XML documents the RDF processor > should escape the URI as soon as it can... > [...] > > The reason for this is that URI equality is important in RDF. > The realistic algorithm for URI equality is binary comparison, Not so. URIs are sequences of characters and it is these characters that must be compared. Even RFC 2396, which is pretty weak w/r character encoding and tries to muddy the question much more than to solve it, agrees with this: [RFC 2396 Section 1.5] A URI is a sequence of characters [...] The interpretation of a URI depends only on the characters used and not how those characters are represented in a network protocol. Similar statements ("URI consist of a restricted set of characters", "A URI is represented as a sequence of characters, not as a sequence of octets") can be found elsewhere. This matches what XML thinks of URIs (in system literals) and the way they are used in practice: printed on billboards, hand-written on napkins, typed into computers, etc.: always characters. To maintain this identity of URIs as characters, it is important *not* to escape URIs (as opposed to escaping individual syntax-significant characters at creation time, precisely to escape their significance in the URI syntax), except when you absolutely must. The only reason one must do so at some point is because RFC 2396 says so, for no good reasons at all in the case of non-ASCII characters, when putting them on the wire to retrieve a resource. Hopefully this silly requirement will some day be done away with, but until such time W3C specs have to pay tribute, deal with RFC 2396's lack of a solution for the character encoding encoding issue (by mandating UTF-8), and mention escaping. The only reasonable position for the latter is to say "only if you must and even then as late as possible", in order to preserve the actual characters that the author/document creator intended and gave you. These are the characters that should be used in URI comparisons. > I think that the internal representation of international URIs in RDF > should be US ASCII RFC 2396 URIs. That would be a rather serious mistake, IMHO. -- François
Received on Friday, 19 October 2001 14:19:19 UTC