Infosets for XMLP vs. Intro to Infoset CR Draft from Noah_Mendelsohn@lotus.com on 2001-06-21 (xml-dist-app@w3.org from June 2001)

From: <Noah_Mendelsohn@lotus.com>
Date: Thu, 21 Jun 2001 11:48:18 -0400
To: xml-dist-app@w3.org
Cc: jcowan@reutershealth.com, pgrosso@arbortext.com
Message-ID: <OFF3B219F8.19950922-ON85256A72.00503DB2@lotus.com>
My sincere apologies.  The note I sent yesterday referred to a very old 
draft of the Infoset CR, the result of following a bad bookmark in my 
browser, and perhaps not remembering that Infoset had been in (or near) CR 
so long and had changed so much.   (I thought I had doublechecked which 
draft I had,  but obviously not carefully enough).   The new Infoset draft 
makes significant strides toward dealing with the concern raised 
yesterday.     For completeness, here is a redraft of yeserday's note, 
ending in a recommendation that XMLP do nothing formally, and leave any 
changes to the discretion of the Infoset team.  Again, my apologies for 
the confusion, and thank you to David Ezell for noticing and warning me.

=====REDRAFT OF YESTERDAY'S NOTE FOLLOWS =========


Quick background for those not in the Protocols WG:  there has been some 
semi-formal discussion in the protocols WG as to whether it might be 
appropriate to adopt, in one of various possible ways, XML Infosets as the 
fundamental means of describing an XMLP message.  Without going into 
details, the general idea would be to have XMLP specify the contents of 
messages in the form of Element and Attribute Information items. 
Particular bindings, such as the http binding, could then decide on the 
exact representation of the XML on the wire (ordinary well-formed, 
compressed, encrypted, something else, etc.).

Anyway, on an XMLP WG teleconference today I was asked to post this note 
to the dist App list.   The purpose of this note is not primarily to start 
a discussion of the merits of the above proposal:  indeed, the WG is still 
considering whether and in what form to make such a proposal.  Rather, I 
have been asked to mention a specific aspect of the Infoset Candidate 
recommendation draft that is perhaps less than ideal for our purposes.

Specifically, the CR Infoset draft states [1]:

"This specification defines an abstract data set called the XML Information 
Set (Infoset). Its purpose is to provide a consistent set of definitions 
for use in other specifications that need to refer to the information in a 
well-formed XML document [XML]. 

[...]

"An XML document has an information set if it is well-formed and satisfies 
the namespace constraints described below. There is no requirement for an 
XML document to be valid in order to have an information set."


The potential problem is that this text, though arguably a bit ambiguous, 
strongly suggests that well formed documents come first, and then Infosets 
must be derived from them.  There are important use cases, such as in 
XMLP, in which XML infosets should describe documents that may not 
initially (or ever) exist in the form of a sequence of Unicode characters 
with "<...>" syntax.  As a trivial example, consider an empty DOM to which 
a program adds element and attribute nodes.  Surely there is an Infoset, 
but no "<...>".  Indeed, this is likely to be the common case for the 
sender of an XMLP message.  If the binding sends an encrypted form, there 
need not at any time be "<" form or a conventional parser at the receiver; 
 the SOAP processor might go directly from encrypted to SAX/DOM/etc. at 
the receiving end.

The specific question on the table in yesteray's call was:  should the 
XMLP WG send a comment to the Infoset (core) group recommending a change 
to the introduction of the XML Infoset candidate rec.? 

An important section [2] was added between the Infoset draft I referenced 
yesterday and the one that's current:

"Synthetic Infosets

This specification describes the information set resulting from parsing an 
XML document. Information sets may be constructed by other means, for 
example by use of an API such as the DOM or by transforming an existing 
information set. "

This largely deals with the concern (someone pointed this out on the call 
yesterday, but I couldn't find the reference because I was looking in the 
wrong draft.)  My personal feeling is that the Intro [1] is slightly out 
of sync with [2], but I con't think that rises to the level of something 
the XMLP group should be taking on as a formal response. 

So my net is, with apologies for any confusion caused, just let it go.  I 
have cc:'d John Cowan and Paul Grosso, so the Infoset team is aware that 
at least one of us had some confusion reading the intro.  I think it 
should be at their editorial discretion as to whether any minor cleanup is 
in order for a Recommendation.  Clearly the intent of the CR draft is 
correct.  I presume that anyone in the XMLP WG who disagrees with my 
recommended disposition will speak up.  Thank you.


[1] http://www.w3.org/TR/xml-infoset/#intro
[2] http://www.w3.org/TR/xml-infoset/#intro.synthetic

------------------------------------------------------------------------
Noah Mendelsohn                                    Voice: 1-617-693-4036
Lotus Development Corp.                            Fax: 1-617-693-8676
One Rogers Street
Cambridge, MA 02142
------------------------------------------------------------------------
Received on Thursday, 21 June 2001 11:54:34 UTC