W3C home > Mailing lists > Public > www-tag@w3.org > August 2004

Re: ACTION NW xmlChunk-44: Chunk of XML - Canonicalization and equality

From: <noah_mendelsohn@us.ibm.com>
Date: Sat, 28 Aug 2004 21:07:21 -0400
To: Norman Walsh <Norman.Walsh@Sun.COM>
Cc: www-tag@w3.org
Message-ID: <OF7A7BB1F6.C46784B5-ON85256EFE.00657DCA@lotus.com>

I think Norm and Henry are onto a very important issue regarding Infosets: 
 we should in the Infoset Recommendation do a better job of clarifying 
consistency constraints or lack thereof.  For example, Henry and others 
seem to have inferred that an Infoset with a Doc Info Item indicating 
version 1.0 should contain only content serializable as XML 1.0.  Norm 
suggests otherwise.  Except insofar as silence indicates lack of a 
constraint, I think the Infoset Rec. can reasonably be read either way. 

Indeed, there are other similar and perhaps more insidious points of 
confusion.  I and other members of the SOAP WG were somewhat surprised to 
be shown that the Infoset Rec. nowhere restricts character [children] to 
be those allowed in some version of XML.  Thus, NULs are allowed in an 
Infoset by this interpretation, even though no published version of XML 
allows NUL characters.  We had to make a late clarification in some of our 
SOAP work to handle this (I don't think it made the original Rec., but is 
in the mill as an erratum, I think.)

Yet another question is whether [parent]'s must be present.  Some have 
inferred that an [attribute] is necessarily associated with a parent 
element, and that both can eventually trace their ancestory to a [document 
information item], which might in turn provide a constraining XML version. 
 My own reading is that no such constraint is present  regarding parents, 
and that a Rec such as Schema 1.0 that refers only to [Element Information 
Item]s would need an explicit clarification if all elements to be 
validated were required to have a doc info ancestor.

The point of this note is not to suggest what the constraints answers are, 
if any, but that the Infoset Recommendation should be clarified.  If, as 
Norm suggests, the intention is indeed to avoid constraints, we should 
make that clearer.  I wonder whether it would then be worth giving a name 
to Infosets that do after all meet certain common constraints.  For 
example, one might list a set of rules for Infosets "serializable as XML 
1.0 documents", "full document infosets" (I.e. those with a [Document 
Information Item] or some such.  We have a number of Recommendations that 
either create or by implication are capable of using synthetic Infosets 
that are intended to represent either entire well-formed XML documents or 
fragments thereof.    As it stands, it's a bit tricky to write such 
Recommendations, and constraints such as "[children] must consist of 
characters matching the {char} production of XML 1.0"  potentially require 
restatement in each Recommendation.  That seems unfortunate.

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------
Received on Sunday, 29 August 2004 01:08:51 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 26 April 2012 12:47:27 GMT