RE: I18N (was: Closing rdfms-difference-between-ID-and-about)

Jeremy Carroll wrote:
> My take on the erratum at 
> http://www.w3.org/XML/xml-V10-2e-errata#E26. is
> that RDF needs to specify that for RDF/XML documents the RDF processor
> should escape the URI as soon as it can...
> [...]
> 
> The reason for this is that URI equality is important in RDF. 
> The realistic algorithm for URI equality is binary comparison, 

Not so.  URIs are sequences of characters and it is these characters that
must be compared.  Even RFC 2396, which is pretty weak w/r character
encoding and tries to muddy the question much more than to solve it, agrees
with this:

   [RFC 2396 Section 1.5]
   A URI is a sequence of characters [...]  The interpretation 
   of a URI depends only on the characters used and not how 
   those characters are represented in a network protocol.

Similar statements ("URI consist of a restricted set of characters", "A URI
is represented as a sequence of characters, not as a sequence of octets")
can be found elsewhere.  This matches what XML thinks of URIs (in system
literals) and the way they are used in practice: printed on billboards,
hand-written on napkins, typed into computers, etc.: always characters.

To maintain this identity of URIs as characters, it is important *not* to
escape URIs (as opposed to escaping individual syntax-significant characters
at creation time, precisely to escape their significance in the URI syntax),
except when you absolutely must.  The only reason one must do so at some
point is because RFC 2396 says so, for no good reasons at all in the case of
non-ASCII characters, when putting them on the wire to retrieve a resource.


Hopefully this silly requirement will some day be done away with, but until
such time W3C specs have to pay tribute, deal with RFC 2396's lack of a
solution for the character encoding encoding issue (by mandating UTF-8), and
mention escaping.  The only reasonable position for the latter is to say
"only if you must and even then as late as possible", in order to preserve
the actual characters that the author/document creator intended and gave
you. These are the characters that should be used in URI comparisons.

> I think that the internal representation of  international URIs in RDF
> should be US ASCII RFC 2396 URIs.

That would be a rather serious mistake, IMHO.

-- 
François 

Received on Friday, 19 October 2001 14:19:19 UTC