Re: Newbie comments about Canonical XML from Eric van der Vlist on 2000-09-11 (w3c-ietf-xmldsig@w3.org from July to September 2000)

From: Eric van der Vlist <vdv@dyomedea.com>
Date: Mon, 11 Sep 2000 21:06:28 +0200
To: John Boyer <jboyer@PureEdge.com>
CC: w3c-ietf-xmldsig@w3.org
Message-ID: <39BD2D34.3DFC32F5@dyomedea.com>
John,

There were 2 different comments in my email, sorry if I haven't been
clear about it.

In both cases, I am not questioning the legitimacy of the behavior
defined in c14n, but rather suggesting to clarify the point in the spec.

The first point which was my starting point is about comparing 2 XML
documents (or maybe should I say files) considered as standalone
including their internal document type definitions.

I reckon that it's a specific need which should be addressed using
specific tools. One of the way to do it is to use a specific parser like
the one I have described in a recent article [r1] and eventually to use
its canonized output.

[r1] http://www.xml.com/pub/2000/08/09/xslt/xslt.html

The second point was to highlight the fact that a canonical XML
document, even if you control it's source as a file (stored in your
local hard drive), is a function of many factors (including the time and
the full request header sent to include the external entities) and that
you are canonizing a picture of this file/document rather than the
file/document itself.

This point is independent of the first one (even if it's related) but I
think it might be worse mentioning it and it's about this second point
only that I had written:

> > Even if you have a tight control on the document you're canonizing,
> > since you are integrating data from external documents, you can't
> > guarantee that a next processing of the same document will give the same
> > canonical XML as these documents (or more exactly the answer to the
> > request for these documents) may vary.
> > 
> > It's may not be a problem for security applications which will require a
> > signature check processed on the object model (DOM or other) they are
> > working on, but here again, it might be worth mentioning this point
> > since it has an impact on the architecture to use: IMHO you can't safely
> > test the equivalence of two documents based on their canonical XML and
> > then reload the document in another tool assuming it's still
> > equivalent...

John Boyer wrote:
> 
> Hi Eric,
> 
> I'd be interested in a more specific version of your comments.  Although,
> c14n is doing what is intended, there is some good food for thought (below)
> if I've interpreted you correctly.  Also, please refer to the version of the
> document you are reading.  My comments refer to [1].
> 
> [1] http://www.w3.org/TR/2000/WD-xml-c14n-20000907

I am reading this same version.

> Firstly, C14N is actually best-suited to standalone documents.  However, I
> think you are using the term 'standalone document' to mean something
> different than its definition in the XML 1.0 specification.  I believe you
> mean that you want to canonicalize the file containing the root element, but
> not include information that the document obtains from external sources.
> See below for some advice on this.

Sorry, I meant the document considered as standalone (like if we had
forced its standalone attribute to "yes").
 
> In general, it is a good thing that the canonical form should change the
> information value of the document changes by simply moving it to a different
> box.  It is, at that point, a different XML document, and it is the stated
> purpose of C14N to report a difference.  However, the term 'document' as it
> appears in XML 1.0 is seemingly different from how you are using it.  By
> document, you seem to mean 'file containing root element'.  Again, see below
> for some advice on this.

Yes, I should have written "file" rather than "document".
 
> As for the loss of DTD (as well as the loss of entity references), we're
> aware that we're tossing out this information.  We're doing this so that
> C14N can be implemented with baseline XML 1.0 processors.  For the most
> part, the loss of information does not hurt the canonical form, but there
> are a few limitations, which are given in [1] (as well as advice on how to
> avoid or overcome these limitations).
> 
> However, I cannot tell for sure, but it sounds like you may be trying to use
> canonical forms in the creation of a XML document development environment,
> not an XML processing application.  I can imagine that your processing needs
> would be far greater than those of an XML 1.0 processing application.
> However, it also stands to reason that you would have much more
> sophisticated tools at your disposal than we are assuming.

Yes, I agree.
 
> As such, may I recommend a simple round of pre-processing and
> post-processing.  Firstly, the loss of entity references seems to be causing
> difficulty.  However, in a development environment, your tool simply *must*
> be able to track any and all entity references, so turning each reference
> into character data before running the canonicalizer should be a piece of
> cake.  Secondly, your tool simply must have access to the doctypedecl, so
> prepending it to the canonical form as a post-processing step of
> canonicalization should also be simple.

Yes, it's basically what I have described in [r1].
 
> The results of these two easy steps should make the canonical forms of much
> more value within your application.  In general, the core specification
> cannot assume the implementer has access to such advanced tools, but perhaps
> an additional paragraph in the limitations section spelling out the
> information above wouldn't hurt.

Yes.

> Please let us know if this information works for you, or please feel free to
> elaborate if I've gone down a path other than the one you intended...

In fact you've gone one of the paths I intended !

Thanks

Eric
 
> Thanks,
> John Boyer
> Development Team Leader,
> Distributed Processing and XML
> PureEdge Solutions Inc.
> Creating Binding E-Commerce
> v: 250-479-8334, ext. 143  f: 250-479-3772
> 1-888-517-2675   http://www.PureEdge.com <http://www.pureedge.com/>
> 
> -----Original Message-----
> From: w3c-ietf-xmldsig-request@w3.org
> [mailto:w3c-ietf-xmldsig-request@w3.org]On Behalf Of Eric van der Vlist
> Sent: Saturday, September 09, 2000 8:51 AM
> To: w3c-ietf-xmldsig@w3.org
> Subject: Newbie comments about Canonical XML
> 
> Hi,
> 
> After my first serious glance at the Canonical XML spec, I have a couple
> of comments (I only hope they are not FAQs...):
> 
> This canonization seems to meet a very specific need.
> 
> I came to it to see if it could be used to compare XML documents and
> find their differences before a CVS checkin and my first finding was
> that Canonical XML is not meant to deal with standalone documents and
> looses any document type information.
> 
> To say that 2 documents are identical based on the canonical output
> seems pretty limitative in these conditions !
> 
> I reckon that it's meeting a need, but find that it might deserve a
> mention in the abstract.
> 
> To elaborate on this point, I think that what is described here is not
> so much "a physical representation, the canonical form, of an input XML
> document", but rather a physical representation of an object model taken
> at a given instant under given conditions (and at a given location).
> 
> Even if you have a tight control on the document you're canonizing,
> since you are integrating data from external documents, you can't
> guarantee that a next processing of the same document will give the same
> canonical XML as these documents (or more exactly the answer to the
> request for these documents) may vary.
> 
> It's may not be a problem for security applications which will require a
> signature check processed on the object model (DOM or other) they are
> working on, but here again, it might be worth mentioning this point
> since it has an impact on the architecture to use: IMHO you can't safely
> test the equivalence of two documents based on their canonical XML and
> then reload the document in another tool assuming it's still
> equivalent...
> 
> My 0,02 Euros
> 
> Eric
> --
> ------------------------------------------------------------------------
> Eric van der Vlist       Dyomedea                    http://dyomedea.com
> http://xmlfr.org         http://4xt.org              http://ducotede.com
> ------------------------------------------------------------------------

-- 
------------------------------------------------------------------------
Eric van der Vlist       Dyomedea                    http://dyomedea.com
http://xmlfr.org         http://4xt.org              http://ducotede.com
------------------------------------------------------------------------
Received on Monday, 11 September 2000 15:05:32 UTC