W3C home > Mailing lists > Public > public-microxml@w3.org > October 2012

Canonical MicroXML

From: James Clark <jjc@jclark.com>
Date: Tue, 2 Oct 2012 19:05:48 +0700
Message-ID: <CANz3_EZqxnvD1AGRzpeB3-_Jzx6fAqR9eoUz0G+7VS0dyO_g=g@mail.gmail.com>
To: James Fuller <jim@webcomposite.com>
Cc: "public-microxml@w3.org" <public-microxml@w3.org>
On Tue, Oct 2, 2012 at 4:37 PM, James Fuller <jim@webcomposite.com> wrote:

I use xml canonisation all the time for precise diff calcs that have
> nothing to do with security (for example genetic algorithm fitness,
> which must characterise precisely differences between 2 files) 


I hear you. I believe the first version of XML Canonicalization was
actually defined by me for the purposes of parser testing:

http://www.jclark.com/xml/canonxml.html

The C14N specs make incredibly heavy weather of defining something that is
very simple.

We could add an Appendix that defines it very succinctly as follows.

The Canonical MicroXML for a document is the unique MicroXML document that

a) has the same data model as that document
b) matches the grammar below (productions not defined below are as defined
in the body of the spec)
c) has attributes in lexicographic (Unicode code point) order

document ::= element #xA
element ::= startTag content endTag
startTag ::= '<' name attributeList '>'
endTag ::= '</' name '>'
content ::= (element | dataChar | charRef)*
attributeList ::= (space attribute)*
attribute ::= attributeName  '='  attributeValue
attributeValue ::= '"' ((attributeValueChar - '"') |
attributeValueCharRef)* '"'
attributeValueCharRef ::= charRef | '&quot;'
charRef ::= '&lt;' | '&amp;' | '&gt;'
space ::= #x20

Is this worth including in the spec?

James
Received on Tuesday, 2 October 2012 12:06:37 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 October 2012 12:06:37 GMT