Re: pfps-04

From: Brian McBride <bwm@hplb.hpl.hp.com>
Subject: Re: pfps-04
Date: 24 Jul 2003 15:53:39 +0100

> On Wed, 2003-07-23 at 23:03, Peter F. Patel-Schneider wrote:
> 
> [...]
> 
> > Therefore for the RDF entailment rules to be complete, no XML Literal can
> > have a character string as its denotation.
> 
> Right.  The denotation of an XML Literal is an octet sequence, as
> defined by the xml canonicalization spec, see the note in:
> 
>  
> http://www.w3.org/2001/sw/RDFCore/TR/WD-rdf-concepts-20030117/#section-XMLLiteral

Unfortunately this does not answer the question.  Octet sequence is
undefined in http://www.w3.org/TR/2002/REC-xml-exc-c14n-20020718/.  At
least some places in this document appear to indicate that an octet
sequence is just a sequence of (Unicode?) characters.  (See for example,
the example in Section 2.2 of ``the Canonical XML version of elem2 from the
second case''.)  Also, the phrase ``exclusive canonical XML refers to XML
that is in exclusive canonical form'' appears to indicate that exclusive
canonical XML is a subset of XML, again indicating that octets should
probably be a restricted form of (Unicode?) characters.

Following pointers leads to
http://www.w3.org/TR/2001/REC-xml-c14n-20010315, where the canonical form
of an XML document is a physical representation of the document encoded in
UTF-8, and talks about octets encoding various kinds of characters.  This
doesn't help matters too much.

So the question boils down to whether octets and Unicode characters are
disjoint.   

[...]

Peter F. Patel-Schneider
Bell Labs Research
Lucent Technologies

Received on Thursday, 24 July 2003 11:31:35 UTC