Re: [RIF-RDF] (potential) issues regarding correspondence of identifiers from Jos de Bruijn on 2007-09-03 (public-rif-wg@w3.org from September 2007)

From: Jos de Bruijn <debruijn@inf.unibz.it>
Date: Mon, 03 Sep 2007 13:45:30 +0200
To: Dave Reynolds <der@hplb.hpl.hp.com>
CC: RIF <public-rif-wg@w3.org>
Message-ID: <46DBF3DA.5010401@inf.unibz.it>

>> b) RDF plain literals versus XML schema strings
>> An open question (for me) is what the exact differences are between the
>> value spaces of the RDF plain literals without language tags and
>> xsd:string. The value space of RDF plain literals without language tags
>> consists of all Unicode strings. Both in the current specification of
>> XML schema datatypes and in the current working draft of XML schema 1.1
>> data types the value space of the string datatype is restricted to the
>> sequences of Unicode characters excluding the surrogate blocks, FFFE,
>> and FFFF. 
> 
> I don't see what difference you are referring to.
> 
> The value space of plain literals without language tags is presumably
> sequences of unicode characters.
> 
> The unicode codepoints in the surrogate blocks are not themselves
> characters [1]. They are reserved codes used in UTF-16 encoding only - a
> pair of surrogate code units combine to form a single (32bit) code
> point. So the value space for plain literals does not include the
> surrogate code points it includes the code points which can be
> represented using the surrogate pairs. See also the discussion in [2].
> 
> Similarly FFFE and FFFF are not characters either [3].

OK, good. Then we don't have an issue here :)

> 
>> There are some further differences between the specification
>> of the string datatype in XML schema 1.0 and XML schema 1.1; in the
>> former case, the datatype is based on the Char production in XML 1.0; in
>> the latter case, the datatype is based on the Char production in XML 1.1.
>> An important question is what to do with plain literals which contain
>> characters which are not in the lexical space of xsd:string.
> 
> So there is a real difference there. XML 1.0 does not allow characters
> like BEL (those below #x20 other than #x9 #xA and #xD) XML 1.1 does
> allow those characters.

Are these characters (i.e. those below #x20 other than #x9 #xA and #xD)
actually Unicode characters?

> 
> Since the only normative exchange syntax for RIF and for RDF is XML then
> it is not actually possible to exchange characters sequences other than
> those expressible in the XML version one is dealing with. So we just
> have to be clear which version XML RIF is based on.

It is always possible to define an embedding, but of course it would be
ideal if strings can be exchanged as such.

> 
> The most general solution is perhaps to say that the we regard the value
> space of xsd:string being that defined in XML 1.1. Exchange using XML
> 1.0 is entirely legal and permitted but the lexical space is then
> restricted slightly.

I guess this is reasonable, because XML schema 1.1 has last call working
draft status.

Best, Jos

> 
> Dave
> 
> [1] http://unicode.org/glossary/#S see entries under surrogate, note the
> comment under "Surrogate character".
> 
> [2] http://lists.xml.org/archives/xml-dev/199909/msg00658.html
> 
> [3] http://unicode.org/charts/PDF/UFFF0.pdf
> 

-- 
Jos de Bruijn            debruijn@inf.unibz.it
                      http://www.debruijn.net/
----------------------------------------------
As far as the laws of mathematics refer to
reality, they are not certain; and as far as
they are certain, they do not refer to
reality.
  -- Albert Einstein

Received on Monday, 3 September 2007 11:45:39 UTC