- From: Norman Walsh <Norman.Walsh@Sun.COM>
- Date: Mon, 28 Jun 2004 10:58:36 -0400
- To: www-tag@w3.org
- Message-id: <87smcflof7.fsf@nwalsh.com>
From http://www.w3.org/2001/tag/actions_owner.html#NW xmlChunk-44: Chunk of XML - Canonicalization and equality Write up a named equivalence function based on today's discussion (e.g., based on infoset, augmented with xml:lang/xml:base, not requiring prefixes, etc.). * accepted on 12 May 2004 I submit that the following proposal completes this action: Open questions: 1. There is no normative description of how to build an infoset. It follows that infoset equality does not guarantee equality of the underlying serialized form in any absolute sense. 2. Should processing instructions be significant? 3. Should comments be significant? 4. Should the document type declaration be significant? I've taken a conservative approach in this description, answering "yes" to points 2-4. General notes: - Ordered lists (such as the [children] property) are compared pairwise and in order. In other words, two ordered lists "A" and "B" are the same if and only if the first item if "A" is the same as the first item of "B", the second item of "A" is the same as the second item of "B", etc. It follows that they can only be the same if they are the same length. - Unordered lists (such as the [attributes] property) are compared pairwise and without respect to order. In other words, two unordered lists "A" and "B" are the same if and only if there exists a set of pairs of items, one from each list, such that the two items in each pair are equal and no item from "A" or "B" appears in more than one pair. It follows that they can only be the same if they are the same length. - XML Base. If the infosets being compared were constructed by an application that claims conformance to the XML Base recommendation, then the xml:base attribute is excluded from attribute comparisons. - Natural Language. The xml:lang attribute is not treated specially in the Infoset but is intended to have a scoped effect much like the base URI. This proposal finesses that point by requiring that elements and attributes must be in the same language If the infosets being compared were constructed by an application that provides application semantics for xml:lang, then the application must be able to determine whether or not two elements or attributes have the same language. If the infosets being compared were constructed by an application that does not provide special semantics for xml:lang, then two elements or attributes have the same language if they have the same inherited value for xml:lang. The inherited value for xml:lang is the value of xml:lang on the element in question or the value from the closest ancestor. In XPath terms: (ancestor-or-self::*/@xml:lang)[last()] Languages are compared case insensitively. - When two information items are compared: - Properties with the value "no value" are equal. - Properties with the value "unknown" are not equal. 0. Infosets Two infosets are equal if their Document Information Items are equal. 1. Document Information Items Two document information items are equal if the following properties are equal: - [children] - [document element] - [all declarations processed] - [base uri] 2. Element Information Items Two element information items are equal if they have the same language and the following properties are equal: - [namespace name] - [local name] - [children] - [attributes], exclusive of xml:lang - [base uri] 3. Attribute Information Items Two attribute information items are equal if they have the same language and the following properties are equal: - [namespace name] - [local name] - [normalized value] - [attribute type] 4. Processing Instruction Information Items Two processing instruction information items are equal if the following properties are equal: - [target] - [content] - [base uri] 5. Unexpanded Entity Reference Information Items Two unexpanded entity reference information items are equal if the following properties are equal: - [name] - [system identifier] - [public identifier] 6. Character Information Items Two character information items are equal if the following properties are equal: - [character code] - [element content whitespace] 7. Comment Information Items Two comment information items are equal if the following properties are equal: - [content] 8. The Document Type Declaration Information Item Two documen type declaration information items are equal if the following properties are equal: - [system identifer] - [public identifier] - [children] 9. Unparsed Entity Information Items Two unparsed entity information items are equal if the following properties are equal: - [name] - [system identifer] - [public identifier] - [notation name] Be seeing you, norm -- Norman.Walsh@Sun.COM / XML Standards Architect / Sun Microsystems, Inc. NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
Received on Monday, 28 June 2004 10:59:26 UTC