Re: I18N (was: Closing rdfms-difference-between-ID-and-about) from Dan Connolly on 2001-10-19 (w3c-rdfcore-wg@w3.org from October 2001)

From: Dan Connolly <connolly@w3.org>
Date: Fri, 19 Oct 2001 11:51:47 -0500
To: Jeremy Carroll <jjc@hplb.hpl.hp.com>
CC: w3c-rdfcore-wg@w3.org, w3c-i18n-ig@w3.org
Message-ID: <3BD05A23.E49AAE21@w3.org>

Jeremy Carroll wrote:
> 
> Mark Davis:
> > If the character % were itself escaped, then escaping *would be* fully
> reversible.
> 
> Hmmm, not if you don't know the charset of the original character sequence.
> I seem to remember an example of a non UTF-8 URL in charmod.
> 
> ===
> 
> My take on the erratum at http://www.w3.org/XML/xml-V10-2e-errata#E26. is
> that RDF needs to specify that for RDF/XML documents the RDF processor
> should escape the URI as soon as it can (i.e. just after it gets it from the
> XML processor, or straight after turning a relative URI into an absolute
> one, whichever happens later). i.e. the RDF needs are diammetrically opposed
> to the XML solution.

You're not saying the XML solution conflicts with what RDF needs,
are you? This issue is so messy that I have a hard time following...
but if I understand correctly, the XML processors can
follow the advice that the XML Core WG has given, and RDF
stuff can be layered on top, and it all works out. True?

> The reason for this is that URI equality is important in RDF. The realistic
> algorithm for URI equality is binary comparison, and this only works by
> determining a normalized form for URI's. Because of the one-way nature of
> URI escaping (see above) it is necessaary to normalize to the fully encoded
> form (with uppercase hexadecimal escapes) rather than the fully unencoded
> form.
> 
> I think that the internal representation of  international URIs in RDF
> should be US ASCII RFC 2396 URIs.

If folks are reading in RDF/xml with the intention of 
writing it back out as RDF/xml, I might suggest they
keep the pre-decoded unicode string around as well
as the US ASCII URI....

> For RDF/XML output, and other human display, we could suggest that
> applications should make best efforts to reverse the escaping, with the
> exception of the % character and any that are not well-formed UTF-8.

That's perhaps a useful heuristic for human display, but for
RDF/XML output, it seems unwise. In general, you can't tell
that the %fc%c4 in
	http://example/%fc%c4
is intended to be the utf-8 encodong of a unicode character,
or if it's just two octets that the server was encoding
for other purposes.

Hmm... in some sense, it doesn't matter... if you
put the unicode character in the XML file, the RDF
guy on the other end will get the right URI back out.

But on the other hand, any XSLT code in between will
fail to match it. Hmm...

-- 
Dan Connolly, W3C http://www.w3.org/People/Connolly/

Received on Friday, 19 October 2001 12:52:58 UTC