Re: I18N (was: Closing rdfms-difference-between-ID-and-about) from Graham Klyne on 2001-10-17 (w3c-rdfcore-wg@w3.org from October 2001)

From: Graham Klyne <Graham.Klyne@MIMEsweeper.com>
Date: Wed, 17 Oct 2001 20:44:00 +0100
To: Jeremy Carroll <jjc@hplb.hpl.hp.com>
Cc: w3c-rdfcore-wg@w3.org
Message-Id: <5.1.0.14.2.20011017203624.032f5d00@joy.songbird.com>

At 06:07 PM 10/17/01 +0100, Jeremy Carroll wrote:
[...]
>Internally XML documents are in Unicode, even if their serialization is
>in some other charset the text has been converted to unicode before we
>get to worrying about URI's and IURI's. In practice, I understood the
>position to be that IURIs work with UTF-8 as the encoding. If you have a
>IURI which is not UTF-8 encoded then you still have to do the %HH
>encoding by hand. (This happens in particular with URLs).

I'm not sure what it means to say "Internally XML documents are in Unicode" 
.. I though the XML was essentially a serialization syntax (for a labelled 
and annotated tree structure).

[...]
>Furthermore, I think this goes in the RDF/XML syntax WD, and as far as
>the model goes a URI is an RFC 2396/2732 URI. The syntax WD should
>specify early application of this algorithm, for instance before
>aboutEach processing.

Yes, I agree this should go in the syntax document.
But I note that RFC 2396 recognizes three different presentations of a URI:
   original character sequence
   octet sequence
   URI character sequence

I think we'd need to be clear which of these is intended, if that's 
important.  For the model theory, I'm not sure that it is important (as 
long as its not a non-UTF-8 octet sequence).

#g

------------------------------------------------------------
Graham Klyne                    MIMEsweeper Group
Strategic Research              <http://www.mimesweeper.com>
<Graham.Klyne@MIMEsweeper.com>
------------------------------------------------------------

Received on Wednesday, 17 October 2001 15:59:54 UTC