- From: Norman Walsh <Norman.Walsh@Sun.COM>
- Date: Mon, 28 Jun 2004 10:58:36 -0400
- To: www-tag@w3.org
- Message-id: <87smcflof7.fsf@nwalsh.com>
From http://www.w3.org/2001/tag/actions_owner.html#NW
xmlChunk-44: Chunk of XML - Canonicalization and equality
Write up a named equivalence function based on today's discussion
(e.g., based on infoset, augmented with xml:lang/xml:base, not
requiring prefixes, etc.).
* accepted on 12 May 2004
I submit that the following proposal completes this action:
Open questions:
1. There is no normative description of how to build an infoset. It follows
that infoset equality does not guarantee equality of the underlying
serialized form in any absolute sense.
2. Should processing instructions be significant?
3. Should comments be significant?
4. Should the document type declaration be significant?
I've taken a conservative approach in this description, answering
"yes" to points 2-4.
General notes:
- Ordered lists (such as the [children] property) are compared
pairwise and in order. In other words, two ordered lists "A" and "B"
are the same if and only if the first item if "A" is the same as the
first item of "B", the second item of "A" is the same as the second
item of "B", etc. It follows that they can only be the same if they
are the same length.
- Unordered lists (such as the [attributes] property) are compared
pairwise and without respect to order. In other words, two unordered
lists "A" and "B" are the same if and only if there exists a set of
pairs of items, one from each list, such that the two items in each
pair are equal and no item from "A" or "B" appears in more than one
pair. It follows that they can only be the same if they are the same
length.
- XML Base. If the infosets being compared were constructed by an
application that claims conformance to the XML Base recommendation,
then the xml:base attribute is excluded from attribute comparisons.
- Natural Language. The xml:lang attribute is not treated specially in
the Infoset but is intended to have a scoped effect much like the
base URI. This proposal finesses that point by requiring that
elements and attributes must be in the same language
If the infosets being compared were constructed by an application
that provides application semantics for xml:lang, then the
application must be able to determine whether or not two elements or
attributes have the same language.
If the infosets being compared were constructed by an application
that does not provide special semantics for xml:lang, then two
elements or attributes have the same language if they have the same
inherited value for xml:lang.
The inherited value for xml:lang is the value of xml:lang on the
element in question or the value from the closest ancestor. In XPath
terms: (ancestor-or-self::*/@xml:lang)[last()]
Languages are compared case insensitively.
- When two information items are compared:
- Properties with the value "no value" are equal.
- Properties with the value "unknown" are not equal.
0. Infosets
Two infosets are equal if their Document Information Items are equal.
1. Document Information Items
Two document information items are equal if the following properties
are equal:
- [children]
- [document element]
- [all declarations processed]
- [base uri]
2. Element Information Items
Two element information items are equal if they have the same language
and the following properties are equal:
- [namespace name]
- [local name]
- [children]
- [attributes], exclusive of xml:lang
- [base uri]
3. Attribute Information Items
Two attribute information items are equal if they have the same
language and the following properties are equal:
- [namespace name]
- [local name]
- [normalized value]
- [attribute type]
4. Processing Instruction Information Items
Two processing instruction information items are equal if the
following properties are equal:
- [target]
- [content]
- [base uri]
5. Unexpanded Entity Reference Information Items
Two unexpanded entity reference information items are equal if the
following properties are equal:
- [name]
- [system identifier]
- [public identifier]
6. Character Information Items
Two character information items are equal if the following properties
are equal:
- [character code]
- [element content whitespace]
7. Comment Information Items
Two comment information items are equal if the following properties
are equal:
- [content]
8. The Document Type Declaration Information Item
Two documen type declaration information items are equal if the
following properties are equal:
- [system identifer]
- [public identifier]
- [children]
9. Unparsed Entity Information Items
Two unparsed entity information items are equal if the following
properties are equal:
- [name]
- [system identifer]
- [public identifier]
- [notation name]
Be seeing you,
norm
--
Norman.Walsh@Sun.COM / XML Standards Architect / Sun Microsystems, Inc.
NOTICE: This email message is for the sole use of the intended
recipient(s) and may contain confidential and privileged information.
Any unauthorized review, use, disclosure or distribution is prohibited.
If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
Received on Monday, 28 June 2004 10:59:26 UTC