The XML Query Data Model and PASWA from noah_mendelsohn@us.ibm.com on 2003-06-09 (xml-dist-app@w3.org from June 2003)

From: <noah_mendelsohn@us.ibm.com>
Date: Mon, 9 Jun 2003 19:16:27 -0400
To: xml-dist-app@w3.org
Cc: fallside@us.ibm.com
Message-ID: <OFF92ED69A.FEAEECEE-ON85256D40.006F87E9@lotus.com>
PASWA [1] is an interesting proposal for several reasons, but especially
because it brings a degree of binary modeling and processing to XML (SOAP
in particular), while retaining XML's fundamentally "character oriented"
nature.  It's occurred to me that there are some similarities between the
PASWA model, and the XQuery 1.0 and XPath 2.0 Data Model[2], the latter now
being in last call.  The query model does a first class job of extending
the Infoset (actually re-expressing it) to include a character/value
duality.  I am not suggesting at this point that PASWA should definitely be
based on the Query model, but I am suggesing that there are enough
intriguing hints that it's worth some serious consideration.

Why consider the PASWA and XQuery models together?
==================================================

There are several reasons that suggest we should consider the relationship
between PASWA and the XQuery model, including:

* I believe the W3C has a general principle of taking the trouble to make
its recommendations work together.  If the PASWA model and the query model
are in the same space, we should at least evaluate the possibilties.

* Doing this right might improve our ability to support, for example,
returning a query result through SOAP.

* I have found that thinking about the Query model has highlighted some
aspects of PASWA that seem a bit vague, and that might be worth clarifying
(see below).

Downsides:  even considering the XQuery model will take some time and
energy, and there may be various details and edge cases that are
problematic.  As noted above, the right answer may in the end be to do
nothing, in which case the time spent will be in part a distraction from
moving ahead with PASWA.

What is the XQuery Data Model?
==============================

The best way to find out is to read the spec at [2].  With the caveat that
I am not on the query WG and not an expert in the data model, I will
attempt a summary that should be good enough to make the rest of this note
comprehensible.  There are several reasons that XPath and XQuery need a
model that's richer than XML itself.  In particular, these query
technologies deal when possible with typed XML data, and the XQuery model
can capture much of what's in an XML Schema PSVI.  Thus, every element and
attribute in the XQuery model can have a named type[3].

Furthermore, each element (of simple type) or attribute node can present
its contents in two forms:  as a string [4] or as a value [5].  Now comes
the past that I find so similar to PASWA:  a node can start out as a string
from which a value can be derived (e.g. by schema validation), or you can
start with a value and get a string.  It's the latter case that sounds like
PASWA:  you start with the value, and generate characters only when you
need them.  Query does this, for example, when creating nodes with computed
values.  Let's say I have the following query element constructor fragment:

      <person>
         <name>Bob Smith</name>
         <ageNextYear>{$i/ageThisYear + 1}</ageNextYear>
      </person>

Note that the ageNextYear field is computed.  Just like a PASWA binary
object, it starts out as a value (the result of an arithmetic operation).
The data model says that the value itself can be accessed (probably an
integer, on which more arithmetic could be done) or a string form can be
retrieved.  The conversion is specified on a per-type basis, but is
typically the XML schema canonical form.

It should be noted that the XML query model handles much more than XML
documents, so there would be some questions to settle in reconciling it
with PASWA.  The query model can handle sequences of nodes, free-standing
nodes such as attributes, etc.  In any case, I think that the proposed
PASWA model is quite close to a particular application of the Query data
model.  Indeed, in the proposals below I deal with this be stating that we
only care about Data Models that correspond to legal SOAP Envelope
Infosets.

Some things I learned thinking about the two models
===================================================

Considering PASWA and the Query model together reminds me that we need to
be very careful and explicit about the contract between communicating PASWA
nodes.  In particular, we preseume that a sending node has (in the
interesting case) local knowledge that the character children of some
particular element are in fact the canonical form of the schema type such
as base64Binary.  Some questions:

* Is it inherent in PASWA that every binding must communicate that typing
information to the next hop?  Thinking about the query model reminded me of
this question, as the model has an explict dm:type accessor.  Our options
seem to be:
- Yes in all cases:  this makes it easy for successive hops to preserve the
optimization, but it's not immediately obvious how an older binding would
pass along this hint.
- No, not ever:  Works with any binding, but seems to require each
receiving node to guess and then verify using unspecified means which
elements are subject to optimization.
- Binding dependent:  lets smart bindings do the right thing, but has
inconsistent semantics when traversing new and old bindings.

* Should the PASWA optimization be limited to one or two specific types
such as base64Bianry?
- Yes: this meets the most immediate need.
- No: there will be lots of cases in which values such as integers and
floats will be available at the sending node, and a PASWA like technique
could enable the efficient transmission of all of them.  If we go far
enough, we could carry query results along with full typing.

Formulating PASWA in terms of the XML Query Model
=================================================

I haven't gotten very far with this, but my prelinary noodling suggests the
formulations might be quite clean, indeed perhaps cleaner than the original
PASWA.  Here are two very rough variants, corresponding to the cases where
we do and don't want to send type information along with the message (a
choice we have to make anyway).  These proposals deal only with the actual
message model and processing.  All the other MIME typing and module
proposal stuff can be lifted more or less intact from PASWA.

Variant #1:  No transmitted type information
--------------------------------------------

This feature provides for:

* modeling of SOAP messages in terms of the XML Query 1.0 XPath 2.0 data
model (hereinafter the "Data Model").
* the implementation of bindings that use type information from the model
to optimize the transmission of the SOAP message.

According to this proposal, SOAP messages are modeled according to the Data
Model.  As provided in the specification for the Data Model there is
corresponding to each such data model instance an XML Infoset (see
[6,7,8,9] and similar sections).  Per the usual rules for SOAP, the SOAP
Envelope consists of that Infoset, and all such Infosets MUST conform to
the requirements of the SOAP 1.2 Recommendation;  any Data Model instance
which corresponds to a non-conforming Infoset is not supported by this
feature.

Bindings MAY be specified to use additional information from the Data Model
to optimize the transmission of SOAP messages.  For example, if the dm:type
of an element node is determined to be base64Binary, if its dm:string-value
is known to be in canonical form, and if the dm:typed-value of that element
is available efficiently, then a binding can be constructed to send (an
efficient representation of) the typed-value.  Although the binding may,
for its own purposes, transmit information such as the dm:type, such
information is in general restricted to the binding.  The received SOAP
message consists of the Infoset corresponding to the transmitted data
model, as augmented by any features other than this one, and thus contains
the string-value of the transmitted item.  Similar optimizations are
possible with types other than base64Binary, and may be implemented at the
discretion of the particular binding.

Variant #2:  With type information
----------------------------------

This feature provides for:

* modeling of SOAP messages in terms of the XML Query 1.0 XPath 2.0 data
model (hereinafter the "Data Model").
* the transmission of non-SOAP information from the Data Model,
specifically including the dm:type (from which the association between
dm:typed-values and dm:string-values can invariably be determined).
* the implementation of bindings that use type information from the model
to optimize the transmission of the SOAP message.

According to this proposal, SOAP messages are modeled according to the Data
Model.  As provided in the specification for the Data Model there is
corresponding to each such data model instance an XML Infoset (see
[6,7,8,9] and similar sections).  Per the usual rules for SOAP, the SOAP
Envelope consists of that Infoset, and all such Infosets MUST conform to
the requirements of the SOAP 1.2 Recommendation.  This feature mandates the
transmission of additional information from the data model.  Specifically,
the dm-type of each element and attribute MUST be transmitted from node to
node.  This feature further provides that when a typed element or attribute
is relayed intact by a SOAP intermediary, the dm:type MUST be relayed (we
need rules here if the 2nd hop node doesn't support the feature.)  (Note
that bindings can optimize the common case where the nodes are not, in
fact, typed.  Since the data model is just an abstraction, we can assume
that nodes will not claim to have typed any elements or attributes for
which the overhead of sending the type information is prohibitive.)

Bindings MAY be specified to use additional information from the data model
to optimize the transmission of SOAP messages.  For example, if the dm:type
of an element node is determined to be base64Binary, if its dm:string-value
is known to be in canonical form, and if the dm:typed-value of that element
is available efficiently, then a binding can be constructed to send (an
efficient representation of) the typed-value.  Similar optimizations are
possible with types other than base64Binary, and may be implemented at the
discretion of the particular binding.


Conclusion
==========

That's roughly it.  I'm intrigued by how simple the presentations appear to
be.  I'm sure they are very rough around the edges, perhaps even seriously
buggy, but I think there's some room for optimism that it might all work
out.  If nothing else, I think that by adopting the terminology of the data
model we might bave the option to make our presentation clearer and more
precise.  There is the hope, at least, that we can actually lay the
groundwork for useful synergy between XML Query, XSL, XPath and SOAP.  I'm
curious what you all think?

Noah


[1] http://www.gotdotnet.com/team/jeffsch/paswa/paswa61.html
[2] http://www.w3.org/TR/xpath-datamodel/
[3] http://www.w3.org/TR/xpath-datamodel/#dm-type
[4] http://www.w3.org/TR/xpath-datamodel/#dm-string-value
[5] http://www.w3.org/TR/xpath-datamodel/#dm-typed-value
[6] http://www.w3.org/TR/xpath-datamodel/#DocumentNodeDM2IS
[7] http://www.w3.org/TR/xpath-datamodel/#ElementNodeDM2IS
[8] http://www.w3.org/TR/xpath-datamodel/#AttributeNodeDM2IS
[9] http://www.w3.org/TR/xpath-datamodel/#NamespaceNodeDM2IS

------------------------------------------------------------------
Noah Mendelsohn                              Voice: 1-617-693-4036
IBM Corporation                                Fax: 1-617-693-8676
One Rogers Street
Cambridge, MA 02142
------------------------------------------------------------------
Received on Monday, 9 June 2003 19:17:16 UTC