XML literals, canonical form, and normal form C problem

It appears to me that there is yet another problem with XML Literals.  My
reading of the canonicalization documents (Canonical XML, Version 1.0 and
Exclusive XML Canonicalization, Version 1.0) indicates to me that there are
canonicalized documents that have text that cannot be adquately captured by
Unicode strings in Normal Form C.

Consider, for example, the following XML document (rendered in ASCII, where
#xhh represents a UTF octet)

<?xml version="1.0" encoding="UTF-8"?>
<doc>u#xCC#x88</doc>

I believe that its Exclusive Canonical Form (rendered in ASCII, where #xhh
is as above) is 

<doc>u#xCC#x88</doc>

I thus do not see how 

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF>
<rdf:Description>
  <a rdf:parsetype="Literal">u#xCC#x88</a>
</rdf:Description>
</rdf:RDF>

is to be translated into an RDF graph.


Peter F. Patel-Schneider
Bell Labs Research
Lucent Technologies

Received on Thursday, 7 August 2003 11:03:49 UTC