- From: Dave Reynolds <der@hplb.hpl.hp.com>
- Date: Mon, 03 Sep 2007 12:31:17 +0100
- To: Jos de Bruijn <debruijn@inf.unibz.it>
- CC: RIF <public-rif-wg@w3.org>
Jos de Bruijn wrote: > b) RDF plain literals versus XML schema strings > An open question (for me) is what the exact differences are between the > value spaces of the RDF plain literals without language tags and > xsd:string. The value space of RDF plain literals without language tags > consists of all Unicode strings. Both in the current specification of > XML schema datatypes and in the current working draft of XML schema 1.1 > data types the value space of the string datatype is restricted to the > sequences of Unicode characters excluding the surrogate blocks, FFFE, > and FFFF. I don't see what difference you are referring to. The value space of plain literals without language tags is presumably sequences of unicode characters. The unicode codepoints in the surrogate blocks are not themselves characters [1]. They are reserved codes used in UTF-16 encoding only - a pair of surrogate code units combine to form a single (32bit) code point. So the value space for plain literals does not include the surrogate code points it includes the code points which can be represented using the surrogate pairs. See also the discussion in [2]. Similarly FFFE and FFFF are not characters either [3]. > There are some further differences between the specification > of the string datatype in XML schema 1.0 and XML schema 1.1; in the > former case, the datatype is based on the Char production in XML 1.0; in > the latter case, the datatype is based on the Char production in XML 1.1. > An important question is what to do with plain literals which contain > characters which are not in the lexical space of xsd:string. So there is a real difference there. XML 1.0 does not allow characters like BEL (those below #x20 other than #x9 #xA and #xD) XML 1.1 does allow those characters. Since the only normative exchange syntax for RIF and for RDF is XML then it is not actually possible to exchange characters sequences other than those expressible in the XML version one is dealing with. So we just have to be clear which version XML RIF is based on. The most general solution is perhaps to say that the we regard the value space of xsd:string being that defined in XML 1.1. Exchange using XML 1.0 is entirely legal and permitted but the lexical space is then restricted slightly. Dave [1] http://unicode.org/glossary/#S see entries under surrogate, note the comment under "Surrogate character". [2] http://lists.xml.org/archives/xml-dev/199909/msg00658.html [3] http://unicode.org/charts/PDF/UFFF0.pdf -- Hewlett-Packard Limited Registered Office: Cain Road, Bracknell, Berks RG12 1HN Registered No: 690597 England
Received on Monday, 3 September 2007 11:31:38 UTC