RE: "Chunk of XML" - Canonicalization and equality

Some pointers as requested:

> - XML schema has the "Deep equality" issue as to when any two chunks
> are "equal".

Actually it is XQuery/XPath specs that have tried to define the
fn:deep-equal() function.  See [1].  

Please note that several people have noted possible flaws in the Last
Call version of fn:deep-equal.  See for example [2].  These kinds of
comments demonstrate how difficult it is to define semantics for this
kind of function that every application could use.

> Among the components of the problem, the ways serializations vary are:
> 
> - The underlying data model:  SOAP and WSDL and XML-Include use the
> Infoset;   DSig used XPath 1.0 and XQ I understand uses XPath 2.0 data
> model.

The XSLT 2.0, XQuery 1.0 and XPath 2.0 specifications are all based on
the "XQuery 1.0 and XPath 2.0 Data Model" [3].  

XSLT 2.0 and XQuery 1.0 also share the "XSLT 2.0 and XQuery 1.0
Serialization" specification.  Given your interest in achieving a common
serialization you might be disappointed to seem how parameterized [4]
is.

> - Whether extraneous namespace settings not obviously used are
> included. (This is the difference between XML DSig's exclusive and
> inclusive canon'n)

[4] discusses possible differences that are permitted in a serialized
document depending how the namespaces declarations are reflected in the
serialized document.  And if anyone wants to fully immerse themselves in
how namespace nodes for "constructed XML" can impact serialization they
should study the section in XQuery 1.0 [5] that outlines how namespaces
nodes are copied during construction of XML in XQuery 1.0.

/paulc

[1] http://www.w3.org/TR/xquery-operators/#func-deep-equal 
[2]
http://lists.w3.org/Archives/Public/public-qt-comments/2003Dec/0062.html

[3] http://www.w3.org/TR/xpath-datamodel/ 
[4] http://www.w3.org/TR/xslt-xquery-serialization/ 
[5] http://www.w3.org/TR/xquery/#id-ns-nodes-on-elements 

Paul Cotton, Microsoft Canada 
17 Eleanor Drive, Nepean, Ontario K2E 6A3 
Tel: (613) 225-5445 Fax: (425) 936-7329 
mailto:pcotton@microsoft.com

  

> -----Original Message-----
> From: www-tag-request@w3.org [mailto:www-tag-request@w3.org] On Behalf
Of
> Tim Berners-Lee
> Sent: January 12, 2004 5:11 PM
> To: 'www-tag@w3.org'
> Subject: "Chunk of XML" - Canonicalization and equality
> 
> 
> 
> Recently there seems to be a common thread around processing what I
> will call  "chunk of XML".
> 
> Cases I am aware of:
> 
> - XML itself uses it for an external entity
> - XML schema has the "Deep equality" issue as to when any two chunks
> are "equal".
> - RDF has a "XML Literal" data type which it handles transparently.
It
> needs a notion of when two chunks are the same.
> - XML-DSig signs, and therefore ensures the integrity of, a chunk of
XML
> 
> You can add your own example to this list.
> 
> The problem is that when different parts of a complex system have
> different notions of what a chunk of XML is, then the system built as
a
> whole may break.
> 
> (For example, suppose a Java object is serialized as XML, and the
> result put into a database and then exported as an RDF data value,
> signed, shipped across an insecure channel, the signature checked,
> parsed as an infoset from which a new Java object is built.  The
> signature does not sign the XML base which applied to the chunk, and
> this was tampered with in transit. The result is that the Java object
> has been tampered with even though the signature matched.)
> 
> The XML architecture has tended to be built according to a motto that
> all kinds of things are possible, and the application has to be able
to
> chose the features it needs.  This is fine when there are simply the
> XML toolset and a single "application".  However, real life is more
> complicated, and things are connected together in all kinds of ways.
I
> think the XML design needs to be more constraining: to offer a
> consistent idea of what a chunk of XML is across all the designs, so
> that the value of that chunk can be preserved as invariant across a
> complex system.  Digital Signature and RDF transport are just
> intermediate parts of the design which need to be transparent.
> 
> This required a notion of equality, and a related canonical
> serialization.
> 
> Among the components of the problem, the ways serializations vary are:
> 
> - The underlying data model:  SOAP and WSDL and XML-Include use the
> Infoset;   DSig used XPath 1.0 and XQ I understand uses XPath 2.0 data
> model. No group seems interested in invariance in the serialized XML,
> but only in the parsed XML of some form.
> - Whether xml:lang is included.  (Not done in the DSig canon'ns,
> Internat'n would like to see it in)
> - Whether xml:base  is included (xml:base did not exist when DSig did
> their canon'ns)
> - Whether extraneous namespace settings not obviously used are
> included. (This is the difference between XML DSig's exclusive and
> inclusive canon'n)
> 
> This may be relevant to the upcoming rechartering of XML Core group.
> I feel that the XML development community has to take on this
> responsibility: asking each group which has a concept of a chunk of
XML
> to define its own canonicalization will I think lead to a broken
> overall architecture.
> 
> I have dumped what I know of this issue, apologies for the lack of
> pointers.  Others may be able to fill in pointers to the discussions
of
> this in the various groups, and give more examples.
> 
> 
> Tim BL

Received on Monday, 12 January 2004 20:29:33 UTC