- From: Dan Connolly <connolly@w3.org>
- Date: Fri, 19 Oct 2001 11:51:47 -0500
- To: Jeremy Carroll <jjc@hplb.hpl.hp.com>
- CC: w3c-rdfcore-wg@w3.org, w3c-i18n-ig@w3.org
Jeremy Carroll wrote: > > Mark Davis: > > If the character % were itself escaped, then escaping *would be* fully > reversible. > > Hmmm, not if you don't know the charset of the original character sequence. > I seem to remember an example of a non UTF-8 URL in charmod. > > === > > My take on the erratum at http://www.w3.org/XML/xml-V10-2e-errata#E26. is > that RDF needs to specify that for RDF/XML documents the RDF processor > should escape the URI as soon as it can (i.e. just after it gets it from the > XML processor, or straight after turning a relative URI into an absolute > one, whichever happens later). i.e. the RDF needs are diammetrically opposed > to the XML solution. You're not saying the XML solution conflicts with what RDF needs, are you? This issue is so messy that I have a hard time following... but if I understand correctly, the XML processors can follow the advice that the XML Core WG has given, and RDF stuff can be layered on top, and it all works out. True? > The reason for this is that URI equality is important in RDF. The realistic > algorithm for URI equality is binary comparison, and this only works by > determining a normalized form for URI's. Because of the one-way nature of > URI escaping (see above) it is necessaary to normalize to the fully encoded > form (with uppercase hexadecimal escapes) rather than the fully unencoded > form. > > I think that the internal representation of international URIs in RDF > should be US ASCII RFC 2396 URIs. If folks are reading in RDF/xml with the intention of writing it back out as RDF/xml, I might suggest they keep the pre-decoded unicode string around as well as the US ASCII URI.... > For RDF/XML output, and other human display, we could suggest that > applications should make best efforts to reverse the escaping, with the > exception of the % character and any that are not well-formed UTF-8. That's perhaps a useful heuristic for human display, but for RDF/XML output, it seems unwise. In general, you can't tell that the %fc%c4 in http://example/%fc%c4 is intended to be the utf-8 encodong of a unicode character, or if it's just two octets that the server was encoding for other purposes. Hmm... in some sense, it doesn't matter... if you put the unicode character in the XML file, the RDF guy on the other end will get the right URI back out. But on the other hand, any XSLT code in between will fail to match it. Hmm... -- Dan Connolly, W3C http://www.w3.org/People/Connolly/
Received on Friday, 19 October 2001 12:52:58 UTC