Re: I18N (was: Closing rdfms-difference-between-ID-and-about)

At 06:07 PM 10/17/01 +0100, Jeremy Carroll wrote:
[...]
>Internally XML documents are in Unicode, even if their serialization is
>in some other charset the text has been converted to unicode before we
>get to worrying about URI's and IURI's. In practice, I understood the
>position to be that IURIs work with UTF-8 as the encoding. If you have a
>IURI which is not UTF-8 encoded then you still have to do the %HH
>encoding by hand. (This happens in particular with URLs).

I'm not sure what it means to say "Internally XML documents are in Unicode" 
.. I though the XML was essentially a serialization syntax (for a labelled 
and annotated tree structure).

[...]
>Furthermore, I think this goes in the RDF/XML syntax WD, and as far as
>the model goes a URI is an RFC 2396/2732 URI. The syntax WD should
>specify early application of this algorithm, for instance before
>aboutEach processing.

Yes, I agree this should go in the syntax document.
But I note that RFC 2396 recognizes three different presentations of a URI:
   original character sequence
   octet sequence
   URI character sequence

I think we'd need to be clear which of these is intended, if that's 
important.  For the model theory, I'm not sure that it is important (as 
long as its not a non-UTF-8 octet sequence).

#g


------------------------------------------------------------
Graham Klyne                    MIMEsweeper Group
Strategic Research              <http://www.mimesweeper.com>
<Graham.Klyne@MIMEsweeper.com>
------------------------------------------------------------

Received on Wednesday, 17 October 2001 15:59:54 UTC