Re: I18N (was: Closing rdfms-difference-between-ID-and-about) from Jeremy Carroll on 2001-10-19 (w3c-rdfcore-wg@w3.org from October 2001)

From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
Date: Fri, 19 Oct 2001 17:35:10 +0100
To: <w3c-rdfcore-wg@w3.org>, <w3c-i18n-ig@w3.org>
Message-ID: <JAEBJCLMIFLKLOJGMELDGEFKCCAA.jjc@hplb.hpl.hp.com>

Mark Davis:
> If the character % were itself escaped, then escaping *would be* fully
reversible.

Hmmm, not if you don't know the charset of the original character sequence.
I seem to remember an example of a non UTF-8 URL in charmod.

===

My take on the erratum at http://www.w3.org/XML/xml-V10-2e-errata#E26. is
that RDF needs to specify that for RDF/XML documents the RDF processor
should escape the URI as soon as it can (i.e. just after it gets it from the
XML processor, or straight after turning a relative URI into an absolute
one, whichever happens later). i.e. the RDF needs are diammetrically opposed
to the XML solution.

The reason for this is that URI equality is important in RDF. The realistic
algorithm for URI equality is binary comparison, and this only works by
determining a normalized form for URI's. Because of the one-way nature of
URI escaping (see above) it is necessaary to normalize to the fully encoded
form (with uppercase hexadecimal escapes) rather than the fully unencoded
form.

I think that the internal representation of  international URIs in RDF
should be US ASCII RFC 2396 URIs.

For RDF/XML output, and other human display, we could suggest that
applications should make best efforts to reverse the escaping, with the
exception of the % character and any that are not well-formed UTF-8.

Jeremy

Received on Friday, 19 October 2001 12:35:45 UTC