RE: Newbie comments about Canonical XML

Hi Eric,

I'd be interested in a more specific version of your comments.  Although,
c14n is doing what is intended, there is some good food for thought (below)
if I've interpreted you correctly.  Also, please refer to the version of the
document you are reading.  My comments refer to [1].


Firstly, C14N is actually best-suited to standalone documents.  However, I
think you are using the term 'standalone document' to mean something
different than its definition in the XML 1.0 specification.  I believe you
mean that you want to canonicalize the file containing the root element, but
not include information that the document obtains from external sources.
See below for some advice on this.

In general, it is a good thing that the canonical form should change the
information value of the document changes by simply moving it to a different
box.  It is, at that point, a different XML document, and it is the stated
purpose of C14N to report a difference.  However, the term 'document' as it
appears in XML 1.0 is seemingly different from how you are using it.  By
document, you seem to mean 'file containing root element'.  Again, see below
for some advice on this.

As for the loss of DTD (as well as the loss of entity references), we're
aware that we're tossing out this information.  We're doing this so that
C14N can be implemented with baseline XML 1.0 processors.  For the most
part, the loss of information does not hurt the canonical form, but there
are a few limitations, which are given in [1] (as well as advice on how to
avoid or overcome these limitations).

However, I cannot tell for sure, but it sounds like you may be trying to use
canonical forms in the creation of a XML document development environment,
not an XML processing application.  I can imagine that your processing needs
would be far greater than those of an XML 1.0 processing application.
However, it also stands to reason that you would have much more
sophisticated tools at your disposal than we are assuming.

As such, may I recommend a simple round of pre-processing and
post-processing.  Firstly, the loss of entity references seems to be causing
difficulty.  However, in a development environment, your tool simply *must*
be able to track any and all entity references, so turning each reference
into character data before running the canonicalizer should be a piece of
cake.  Secondly, your tool simply must have access to the doctypedecl, so
prepending it to the canonical form as a post-processing step of
canonicalization should also be simple.

The results of these two easy steps should make the canonical forms of much
more value within your application.  In general, the core specification
cannot assume the implementer has access to such advanced tools, but perhaps
an additional paragraph in the limitations section spelling out the
information above wouldn't hurt.

Please let us know if this information works for you, or please feel free to
elaborate if I've gone down a path other than the one you intended...

John Boyer
Development Team Leader,
Distributed Processing and XML
PureEdge Solutions Inc.
Creating Binding E-Commerce
v: 250-479-8334, ext. 143  f: 250-479-3772
1-888-517-2675 <>

-----Original Message-----
Eric van der Vlist
Sent: Saturday, September 09, 2000
Subject: Newbie comments about Canonical XML


After my first serious glance at the Canonical XML spec, I have a couple
of comments (I only hope they are not FAQs...):

This canonization seems to meet a very specific need.

I came to it to see if it could be used to compare XML documents and
find their differences before a CVS checkin and my first finding was
that Canonical XML is not meant to deal with standalone documents and
looses any document type information.

To say that 2 documents are identical based on the canonical output
seems pretty limitative in these conditions !

I reckon that it's meeting a need, but find that it might deserve a
mention in the abstract.

To elaborate on this point, I think that what is described here is not
so much "a physical representation, the canonical form, of an input XML
document", but rather a physical representation of an object model taken
at a given instant under given conditions (and at a given location).

Even if you have a tight control on the document you're canonizing,
since you are integrating data from external documents, you can't
guarantee that a next processing of the same document will give the same
canonical XML as these documents (or more exactly the answer to the
request for these documents) may vary.

It's may not be a problem for security applications which will require a
signature check processed on the object model (DOM or other) they are
working on, but here again, it might be worth mentioning this point
since it has an impact on the architecture to use: IMHO you can't safely
test the equivalence of two documents based on their canonical XML and
then reload the document in another tool assuming it's still

My 0,02 Euros

Eric van der Vlist              

Received on Monday, 11 September 2000