W3C home > Mailing lists > Public > public-rif-wg@w3.org > September 2007

Re: [RIF-RDF] (potential) issues regarding correspondence of identifiers

From: Dave Reynolds <der@hplb.hpl.hp.com>
Date: Mon, 03 Sep 2007 12:31:17 +0100
Message-ID: <46DBF085.3060406@hplb.hpl.hp.com>
To: Jos de Bruijn <debruijn@inf.unibz.it>
CC: RIF <public-rif-wg@w3.org>

Jos de Bruijn wrote:

> b) RDF plain literals versus XML schema strings
> An open question (for me) is what the exact differences are between the
> value spaces of the RDF plain literals without language tags and
> xsd:string. The value space of RDF plain literals without language tags
> consists of all Unicode strings. Both in the current specification of
> XML schema datatypes and in the current working draft of XML schema 1.1
> data types the value space of the string datatype is restricted to the
> sequences of Unicode characters excluding the surrogate blocks, FFFE,
> and FFFF. 

I don't see what difference you are referring to.

The value space of plain literals without language tags is presumably 
sequences of unicode characters.

The unicode codepoints in the surrogate blocks are not themselves 
characters [1]. They are reserved codes used in UTF-16 encoding only - a 
pair of surrogate code units combine to form a single (32bit) code 
point. So the value space for plain literals does not include the 
surrogate code points it includes the code points which can be 
represented using the surrogate pairs. See also the discussion in [2].

Similarly FFFE and FFFF are not characters either [3].

> There are some further differences between the specification
> of the string datatype in XML schema 1.0 and XML schema 1.1; in the
> former case, the datatype is based on the Char production in XML 1.0; in
> the latter case, the datatype is based on the Char production in XML 1.1.
> An important question is what to do with plain literals which contain
> characters which are not in the lexical space of xsd:string.

So there is a real difference there. XML 1.0 does not allow characters 
like BEL (those below #x20 other than #x9 #xA and #xD) XML 1.1 does 
allow those characters.

Since the only normative exchange syntax for RIF and for RDF is XML then 
it is not actually possible to exchange characters sequences other than 
those expressible in the XML version one is dealing with. So we just 
have to be clear which version XML RIF is based on.

The most general solution is perhaps to say that the we regard the value 
space of xsd:string being that defined in XML 1.1. Exchange using XML 
1.0 is entirely legal and permitted but the lexical space is then 
restricted slightly.

Dave

[1] http://unicode.org/glossary/#S see entries under surrogate, note the 
comment under "Surrogate character".

[2] http://lists.xml.org/archives/xml-dev/199909/msg00658.html

[3] http://unicode.org/charts/PDF/UFFF0.pdf

-- 
Hewlett-Packard Limited
Registered Office: Cain Road, Bracknell, Berks RG12 1HN
Registered No: 690597 England
Received on Monday, 3 September 2007 11:31:38 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 18:33:42 GMT