W3C home > Mailing lists > Public > www-tag@w3.org > August 2004

Re: ACTION NW xmlChunk-44: Chunk of XML - Canonicalization and equality

From: Norman Walsh <Norman.Walsh@Sun.COM>
Date: Wed, 25 Aug 2004 17:20:43 -0400
To: www-tag@w3.org
Message-id: <877jrm52z8.fsf@nwalsh.com>
/ ht@inf.ed.ac.uk (Henry S. Thompson) was heard to say:
| I'm still not happy with the fact that in this and at least one other
| area, the draft finding judges two infosets equivalent despite the
| fact that one is 'well-formed' and the other is not, that is, one
| could be serialized as a well-formed XML document and the other could
| not.
|
| For example
|
|   * An EII with a [local name] with a e.g. a long S in it (U017F) is
|     well-formed only if [version]=1.1;
|
|   * An element with a [namespace name] with a value and a [in-scope
|     namespaces] with no declaration for that value
|
| If the answer is that we're only interested in equivalence of infosets
| arising from the conformant parsing of well-formed character
| sequences, then at the very least this should be made clear in the
| finding.

I'm interested in more than that, so I don't think that's the answer.

I don't see why the comparison function should be doing validity
checking. You have an infoset. An infoset is a bag of properties. I
don't care where you got it from or how you constructed it.

If you've put long S characters in a [local name] in an EII, then
that's what you meant to do. It's only going to compare equal to
another EII that's spelled the same way and that seems reasonable to
me. How you turn your infoset back into a sequence of characters and
what version declaration you have to use to make that sequence of
characters a well-formed XML document is your problem.

I feel the same way about [in-scope namespaces]. Either you got them
right in your infoset, or your infoset represents something that can't
be expressed in XML. I don't think the comparison function should care
about the in-scope namespaces. But I'm less confident on this point
than I am on the issue of names and XML 1.1 characters. :-)

| I hope that's not the answer, in which case I'd be interested not only
| in a specific explanation of why the [in-scope namespaces] EII
| property was not included, but also in the more general question of
| the implicit suggestion above that "Everyone knows what 'well-formed'
| means when applied to infosets" and "It doesn't make sense to define
| equivalence such that a well-formed infoset can be equivalent to a
| non-well-formed infoset."

I hope my explanation above goes some way to expressing how I feel
about the answer to those questions.

                                        Be seeing you,
                                          norm

-- 
Norman.Walsh@Sun.COM / XML Standards Architect / Sun Microsystems, Inc.
NOTICE: This email message is for the sole use of the intended
recipient(s) and may contain confidential and privileged information.
Any unauthorized review, use, disclosure or distribution is prohibited.
If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.

Received on Wednesday, 25 August 2004 21:21:41 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 26 April 2012 12:47:27 GMT