W3C home > Mailing lists > Public > www-rdf-comments@w3.org > July to September 2003

XML literals, canonical form, and normal form C problem

From: Peter F. Patel-Schneider <pfps@research.bell-labs.com>
Date: Thu, 07 Aug 2003 11:03:41 -0400 (EDT)
Message-Id: <20030807.110341.103013569.pfps@research.bell-labs.com>
To: www-rdf-comments@w3.org

It appears to me that there is yet another problem with XML Literals.  My
reading of the canonicalization documents (Canonical XML, Version 1.0 and
Exclusive XML Canonicalization, Version 1.0) indicates to me that there are
canonicalized documents that have text that cannot be adquately captured by
Unicode strings in Normal Form C.

Consider, for example, the following XML document (rendered in ASCII, where
#xhh represents a UTF octet)

<?xml version="1.0" encoding="UTF-8"?>
<doc>u#xCC#x88</doc>

I believe that its Exclusive Canonical Form (rendered in ASCII, where #xhh
is as above) is 

<doc>u#xCC#x88</doc>

I thus do not see how 

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF>
<rdf:Description>
  <a rdf:parsetype="Literal">u#xCC#x88</a>
</rdf:Description>
</rdf:RDF>

is to be translated into an RDF graph.


Peter F. Patel-Schneider
Bell Labs Research
Lucent Technologies
Received on Thursday, 7 August 2003 11:03:49 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 21 September 2012 14:16:32 GMT