W3C home > Mailing lists > Public > w3c-rdfcore-wg@w3.org > October 2001

Re: I18N (was: Closing rdfms-difference-between-ID-and-about)

From: Graham Klyne <Graham.Klyne@MIMEsweeper.com>
Date: Wed, 17 Oct 2001 20:44:00 +0100
Message-Id: <>
To: Jeremy Carroll <jjc@hplb.hpl.hp.com>
Cc: w3c-rdfcore-wg@w3.org
At 06:07 PM 10/17/01 +0100, Jeremy Carroll wrote:
>Internally XML documents are in Unicode, even if their serialization is
>in some other charset the text has been converted to unicode before we
>get to worrying about URI's and IURI's. In practice, I understood the
>position to be that IURIs work with UTF-8 as the encoding. If you have a
>IURI which is not UTF-8 encoded then you still have to do the %HH
>encoding by hand. (This happens in particular with URLs).

I'm not sure what it means to say "Internally XML documents are in Unicode" 
.. I though the XML was essentially a serialization syntax (for a labelled 
and annotated tree structure).

>Furthermore, I think this goes in the RDF/XML syntax WD, and as far as
>the model goes a URI is an RFC 2396/2732 URI. The syntax WD should
>specify early application of this algorithm, for instance before
>aboutEach processing.

Yes, I agree this should go in the syntax document.
But I note that RFC 2396 recognizes three different presentations of a URI:
   original character sequence
   octet sequence
   URI character sequence

I think we'd need to be clear which of these is intended, if that's 
important.  For the model theory, I'm not sure that it is important (as 
long as its not a non-UTF-8 octet sequence).


Graham Klyne                    MIMEsweeper Group
Strategic Research              <http://www.mimesweeper.com>
Received on Wednesday, 17 October 2001 15:59:54 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 14:53:52 UTC