Infosets for XMLP vs. Intro to Infoset CR Draft from Noah_Mendelsohn@lotus.com on 2001-06-20 (xml-dist-app@w3.org from June 2001)

From: <Noah_Mendelsohn@lotus.com>
Date: Wed, 20 Jun 2001 19:12:07 -0400
To: xml-dist-app@w3.org
Cc: jcowan@reutershealth.com, pgrosso@arbortext.com
Message-ID: <OF4B806FDA.CD148A30-ON85256A71.006FD0DA@lotus.com>
Quick background for those not in the Protocols WG:  there has been some
semi-formal discussion in the protocols WG as to whether it might be
appropriate to adopt, in one of various possible ways, XML Infosets as the
fundamental means of describing an XMLP message.  Without going into
details, the general idea would be to have XMLP specify the contents of
messages in the form of Element and Attribute Information items.
Particular bindings, such as the http binding, could then decide on the
exact representation of the XML on the wire (ordinary well-formed,
compressed, encrypted, something else, etc.).

Anyway, on an XMLP WG teleconference today I was asked to post this note to
the dist App list.   The purpose of this note is not primarily to start a
discussion of the merits of the above proposal:  indeed, the WG is still
considering whether and in what form to make such a proposal.  Rather, I
have been asked to mention a specific aspect of the Infoset Candidate
recommendation draft that is perhaps less than ideal for our purposes.

Specifically, the CR Infoset draft states [1]:

"This document specifies an abstract data set called the XML Information
Set (Infoset), a description of the information available in a well-formed
XML document [XML].

"XML 1.0 documents that do not conform to [Namespaces], though technically
well-formed, are not considered to have meaningful information sets as
defined by this specification. That is, this specification does not define
an information set documents that have element or attribute names
containing colons that are used in other ways than as prescribed by
[Namespaces]. There is no requirement for a XML document to be valid in
order to have an information set."

The potential problem is that this text, though arguably a bit ambiguous,
strongly suggests that well formed documents come first, and then Infosets
must be derived from them.  There are important use cases, such as in XMLP,
in which XML infosets should describe documents that may not initially (or
ever) exist in the form of a sequence of Unicode characters with "<...>"
syntax.  As a trivial example, consider an empty DOM to which a program
adds element and attribute nodes.  Surely there is an Infoset, but no "
<...>".  Indeed, this is likely to be the common case for the sender of an
XMLP message.  If the binding sends an encrypted form, there need not at
any time be "<" form or a conventional parser at the receiver;  the SOAP
processor might go directly from encrypted to SAX/DOM/etc. at the receiving
end.

The specific question on the table is:  should the XMLP WG send a comment
to the Infoset (core) group recommending a change to the introduction of
the XML Infoset candidate rec.?

Note: to avoid cross-posting, this is being mailed only to dist-app.  This
is intended to start discussion within the community working on XMLP to
decide whether a formal (or informal) approach to the Infoset group should
be made.  Obviously, discussion from members of that group (or others) is
welcome in response to this note.

[1] http://www.w3.org/XML/Group/2000/03/WD-infoset-20000331.html#intro

------------------------------------------------------------------------
Noah Mendelsohn                                    Voice: 1-617-693-4036
Lotus Development Corp.                            Fax: 1-617-693-8676
One Rogers Street
Cambridge, MA 02142
------------------------------------------------------------------------
Received on Wednesday, 20 June 2001 19:17:04 UTC