RE: Canonical XML Comment (CDATA)

Hi Gregor,

Thanks for writing.  Please respond to this letter even if you agree with
its content since Joseph has asked that we document positive acknowledgement
of the resolution of issues raised by the WG (esp. if we intend the
resolution to be not to change anything).

First, note that there is no requirement to use XPath to build the required
part of C14N.  One's implementation must only behave in the same way, which
includes replacing the CDATA sections with their character content.  The
section about text nodes that you cited [XPath, Section 5.7] mentions CDATA
sections in the discussion of text nodes, but it is in order to say
*exactly* the same thing we are saying:

"Character data is grouped into text nodes. As much character data as
possible is grouped into each text node: a text node never has an
immediately following or preceding sibling that is a text node."

"Each character within a CDATA section is treated as character data."

I believe it is best to make it clear up front that CDATA sections will be
replaced rather than burying it in text node processing because it is easier
for people who don't implement with XPath to see what must be done to make a
logically equivalent structure appropriate to their purposes.

My opinion is that the discussion of the processing model should be based on
the data model, unencumbered by details of how to create the data model.  To
me, this is esp. important given that some may choose to actually implement
based on XPath (e.g. in order to perform document subsetting), and such
people want to read the processing model to figure out how to process the
data structure they have.

As to your point about whether this concatenation is a common feature of XML
processors, hopefully it is clear that it doesn't matter.  Concatenation of
successive blocks of character data is a trivial task.  Implementers can
either choose to do it in their data structure, or they can acknowledge that
since they are consecutive in the input, simply outputing the text as
encountered suffices.

It is important to note that while many processors may report separate
character blocks for CDATA sections, they are not required (and some do not)
distinguish that the text came from a CDATA section because:

"CDATA sections may occur anywhere character data may occur; they are used
to escape blocks of text containing characters which would otherwise be
recognized as markup."
[XML, Section 2.7]

(In other words, CDATA sections are a simple escaping mechanism).


John Boyer
Development Team Leader,
Distributed Processing and XML
PureEdge Solutions Inc.
Creating Binding E-Commerce
v: 250-479-8334, ext. 143  f: 250-479-3772
1-888-517-2675 <>

-----Original Message-----
[]On Behalf Of Gregor Karlinger
Sent: Wednesday, September 13, 2000 5:58 AM
To: XMLSigWG; John Boyer
Subject: Canonical XML Comment (CDATA)

Hi John,

In section 2.1 there are listed the requirements for an XML
processor to be used to create a node set:

 3. replace CDATA sections with their character content

I am not sure, but I don't think this is standard behaviour of
the widespread XML parser implementations.

Wouldn't it be better to skip this requirement and instead
add a sencence to the explanation how to serialize Text nodes
in section 2.2?

The XPath data model also mentions CDATA sections in the
explanation of the data model [1].


Regards, Gregor
Gregor Karlinger
Phone +43 316 873 5541
Institute for Applied Information Processing and Communications

Received on Wednesday, 13 September 2000 12:33:34 UTC