- From: James Clark <jjc@jclark.com>
- Date: Tue, 2 Oct 2012 19:05:48 +0700
- To: James Fuller <jim@webcomposite.com>
- Cc: "public-microxml@w3.org" <public-microxml@w3.org>
- Message-ID: <CANz3_EZqxnvD1AGRzpeB3-_Jzx6fAqR9eoUz0G+7VS0dyO_g=g@mail.gmail.com>
On Tue, Oct 2, 2012 at 4:37 PM, James Fuller <jim@webcomposite.com> wrote: I use xml canonisation all the time for precise diff calcs that have > nothing to do with security (for example genetic algorithm fitness, > which must characterise precisely differences between 2 files) … I hear you. I believe the first version of XML Canonicalization was actually defined by me for the purposes of parser testing: http://www.jclark.com/xml/canonxml.html The C14N specs make incredibly heavy weather of defining something that is very simple. We could add an Appendix that defines it very succinctly as follows. The Canonical MicroXML for a document is the unique MicroXML document that a) has the same data model as that document b) matches the grammar below (productions not defined below are as defined in the body of the spec) c) has attributes in lexicographic (Unicode code point) order document ::= element #xA element ::= startTag content endTag startTag ::= '<' name attributeList '>' endTag ::= '</' name '>' content ::= (element | dataChar | charRef)* attributeList ::= (space attribute)* attribute ::= attributeName '=' attributeValue attributeValue ::= '"' ((attributeValueChar - '"') | attributeValueCharRef)* '"' attributeValueCharRef ::= charRef | '"' charRef ::= '<' | '&' | '>' space ::= #x20 Is this worth including in the spec? James
Received on Tuesday, 2 October 2012 12:06:37 UTC