- From: John Cowan <cowan@ccil.org>
- Date: Sat, 22 Jul 2017 10:20:40 -0400
- To: public-microxml@w3.org
- Message-ID: <CAD2gp_QCAQ1otaAvty7u_bKvnGoiJnQ8XeJSNnvahS8Ovn=4Tw@mail.gmail.com>
The current MicroXML draft says that if you want a canonical form for a MicroXML document, you can apply XML Canonicalization (RFC 3076), but the result is not necessarily well-formed MicroXML. So I thought I would write down a reasonable definition of MicroXML Canonicalization. To canonicalize a MicroXML document, take the following actions: Normalize all line breaks to #xA. Convert all attribute values wrapped in single quotes to be in double quotes, converting any embedded quotation marks into ". Convert all numeric character references in character content and attribute values to single characters, except that & < > become & < > respectively, and (in attribute values only) #&x27 becomes '. Convert empty elements to start-end tag pairs. Remove all whitespace outside the document element. Remove all whitespace within start-tags except for a single space separating the element name from the first attribute (if there is one) and preceding each additional attribute (if any). Remove all whitespace within end-tags. Sort the attributes of each element in lexicographical order by Unicode code points. The result is not Canonical XML, because > has been escaped in attribute values, which Canonical XML doesn't allow. But it is functionally equivalent. Comments? -- John Cowan http://vrici.lojban.org/~cowan cowan@ccil.org If a soldier is asked why he kills people who have done him no harm, or a terrorist why he kills innocent people with his bombs, they can always reply that war has been declared, and there are no innocent people in an enemy country in wartime. The answer is psychotic, but it is the answer that humanity has given to every act of aggression in history. --Northrop Frye
Received on Saturday, 22 July 2017 14:21:24 UTC