Re: I18N (was: Closing rdfms-difference-between-ID-and-about) from Martin Duerst on 2001-10-22 (w3c-rdfcore-wg@w3.org from October 2001)

From: Martin Duerst <duerst@w3.org>
Date: Mon, 22 Oct 2001 15:24:59 +0900
To: Dan Connolly <connolly@w3.org>, Jeremy Carroll <jjc@hplb.hpl.hp.com>
Cc: w3c-rdfcore-wg@w3.org, w3c-i18n-ig@w3.org
Message-Id: <4.2.0.58.J.20011022152033.039b8720@localhost>

At 11:51 01/10/19 -0500, Dan Connolly wrote:
>Jeremy Carroll wrote:
> >
> > Mark Davis:
> > > If the character % were itself escaped, then escaping *would be* fully
> > reversible.

It's totally independent of that. Because % isn't escaped,
a %25 is also not unescaped.


> > Hmmm, not if you don't know the charset of the original character sequence.
> > I seem to remember an example of a non UTF-8 URL in charmod.

Yes, that's where the problems start.


> > ===
> >
> > My take on the erratum at http://www.w3.org/XML/xml-V10-2e-errata#E26. is
> > that RDF needs to specify that for RDF/XML documents the RDF processor
> > should escape the URI as soon as it can (i.e. just after it gets it 
> from the
> > XML processor, or straight after turning a relative URI into an absolute
> > one, whichever happens later). i.e. the RDF needs are diammetrically 
> opposed
> > to the XML solution.
>
>You're not saying the XML solution conflicts with what RDF needs,
>are you? This issue is so messy that I have a hard time following...
>but if I understand correctly, the XML processors can
>follow the advice that the XML Core WG has given, and RDF
>stuff can be layered on top, and it all works out. True?

Yes.


> > The reason for this is that URI equality is important in RDF. The realistic
> > algorithm for URI equality is binary comparison, and this only works by
> > determining a normalized form for URI's. Because of the one-way nature of
> > URI escaping (see above) it is necessaary to normalize to the fully encoded
> > form (with uppercase hexadecimal escapes) rather than the fully unencoded
> > form.
> >
> > I think that the internal representation of  international URIs in RDF
> > should be US ASCII RFC 2396 URIs.
>
>If folks are reading in RDF/xml with the intention of
>writing it back out as RDF/xml, I might suggest they
>keep the pre-decoded unicode string around as well
>as the US ASCII URI....

Yes, very much so.


> > For RDF/XML output, and other human display, we could suggest that
> > applications should make best efforts to reverse the escaping, with the
> > exception of the % character and any that are not well-formed UTF-8.
>
>That's perhaps a useful heuristic for human display, but for
>RDF/XML output, it seems unwise. In general, you can't tell
>that the %fc%c4 in
>         http://example/%fc%c4
>is intended to be the utf-8 encodong of a unicode character,
>or if it's just two octets that the server was encoding
>for other purposes.

Well, in that specific case, you actually can, because %fc%c4
is not a legal UTF-8 byte sequence.


>Hmm... in some sense, it doesn't matter... if you
>put the unicode character in the XML file, the RDF
>guy on the other end will get the right URI back out.

Well, yes, there are problems with control and formating
characters and with characters not in NFC,...
That's why doing a good job at back-conversion is really hard.


Regards,   Martin.

Received on Monday, 22 October 2001 03:03:31 UTC