Infosets for XMLP vs. Intro to Infoset CR Draft

[Redirected by Paul Grosso, original headers follow.]

>From: Noah_Mendelsohn@lotus.com
>To: xml-dist-app@w3.org
>Cc: jcowan@reutershealth.com, pgrosso@arbortext.com
>Subject: Infosets for XMLP vs. Intro to Infoset CR Draft
>Date: Wed, 20 Jun 2001 19:12:09 -0400

Quick background for those not in the Protocols WG:  there has been some 
semi-formal discussion in the protocols WG as to whether it might be 
appropriate to adopt, in one of various possible ways, XML Infosets as the 
fundamental means of describing an XMLP message.  Without going into 
details, the general idea would be to have XMLP specify the contents of 
messages in the form of Element and Attribute Information items. 
Particular bindings, such as the http binding, could then decide on the 
exact representation of the XML on the wire (ordinary well-formed, 
compressed, encrypted, something else, etc.).

Anyway, on an XMLP WG teleconference today I was asked to post this note 
to the dist App list.   The purpose of this note is not primarily to start 
a discussion of the merits of the above proposal:  indeed, the WG is still 
considering whether and in what form to make such a proposal.  Rather, I 
have been asked to mention a specific aspect of the Infoset Candidate 
recommendation draft that is perhaps less than ideal for our purposes.

Specifically, the CR Infoset draft states [1]:

"This document specifies an abstract data set called the XML Information 
Set (Infoset), a description of the information available in a well-formed 
XML document [XML].

"XML 1.0 documents that do not conform to [Namespaces], though technically 
well-formed, are not considered to have meaningful information sets as 
defined by this specification. That is, this specification does not define 
an information set documents that have element or attribute names 
containing colons that are used in other ways than as prescribed by 
[Namespaces]. There is no requirement for a XML document to be valid in 
order to have an information set."

The potential problem is that this text, though arguably a bit ambiguous, 
strongly suggests that well formed documents come first, and then Infosets 
must be derived from them.  There are important use cases, such as in 
XMLP, in which XML infosets should describe documents that may not 
initially (or ever) exist in the form of a sequence of Unicode characters 
with "<...>" syntax.  As a trivial example, consider an empty DOM to which 
a program adds element and attribute nodes.  Surely there is an Infoset, 
but no "<...>".  Indeed, this is likely to be the common case for the 
sender of an XMLP message.  If the binding sends an encrypted form, there 
need not at any time be "<" form or a conventional parser at the receiver; 
 the SOAP processor might go directly from encrypted to SAX/DOM/etc. at 
the receiving end.

The specific question on the table is:  should the XMLP WG send a comment 
to the Infoset (core) group recommending a change to the introduction of 
the XML Infoset candidate rec.? 

Note: to avoid cross-posting, this is being mailed only to dist-app.  This 
is intended to start discussion within the community working on XMLP to 
decide whether a formal (or informal) approach to the Infoset group should 
be made.  Obviously, discussion from members of that group (or others) is 
welcome in response to this note.

[1] http://www.w3.org/XML/Group/2000/03/WD-infoset-20000331.html#intro 

------------------------------------------------------------------------
Noah Mendelsohn                                    Voice: 1-617-693-4036
Lotus Development Corp.                            Fax: 1-617-693-8676
One Rogers Street
Cambridge, MA 02142
------------------------------------------------------------------------

Received on Thursday, 21 June 2001 09:41:46 UTC