- From: <noah_mendelsohn@us.ibm.com>
- Date: Mon, 9 Jun 2003 19:16:27 -0400
- To: xml-dist-app@w3.org
- Cc: fallside@us.ibm.com
PASWA [1] is an interesting proposal for several reasons, but especially because it brings a degree of binary modeling and processing to XML (SOAP in particular), while retaining XML's fundamentally "character oriented" nature. It's occurred to me that there are some similarities between the PASWA model, and the XQuery 1.0 and XPath 2.0 Data Model[2], the latter now being in last call. The query model does a first class job of extending the Infoset (actually re-expressing it) to include a character/value duality. I am not suggesting at this point that PASWA should definitely be based on the Query model, but I am suggesing that there are enough intriguing hints that it's worth some serious consideration. Why consider the PASWA and XQuery models together? ================================================== There are several reasons that suggest we should consider the relationship between PASWA and the XQuery model, including: * I believe the W3C has a general principle of taking the trouble to make its recommendations work together. If the PASWA model and the query model are in the same space, we should at least evaluate the possibilties. * Doing this right might improve our ability to support, for example, returning a query result through SOAP. * I have found that thinking about the Query model has highlighted some aspects of PASWA that seem a bit vague, and that might be worth clarifying (see below). Downsides: even considering the XQuery model will take some time and energy, and there may be various details and edge cases that are problematic. As noted above, the right answer may in the end be to do nothing, in which case the time spent will be in part a distraction from moving ahead with PASWA. What is the XQuery Data Model? ============================== The best way to find out is to read the spec at [2]. With the caveat that I am not on the query WG and not an expert in the data model, I will attempt a summary that should be good enough to make the rest of this note comprehensible. There are several reasons that XPath and XQuery need a model that's richer than XML itself. In particular, these query technologies deal when possible with typed XML data, and the XQuery model can capture much of what's in an XML Schema PSVI. Thus, every element and attribute in the XQuery model can have a named type[3]. Furthermore, each element (of simple type) or attribute node can present its contents in two forms: as a string [4] or as a value [5]. Now comes the past that I find so similar to PASWA: a node can start out as a string from which a value can be derived (e.g. by schema validation), or you can start with a value and get a string. It's the latter case that sounds like PASWA: you start with the value, and generate characters only when you need them. Query does this, for example, when creating nodes with computed values. Let's say I have the following query element constructor fragment: <person> <name>Bob Smith</name> <ageNextYear>{$i/ageThisYear + 1}</ageNextYear> </person> Note that the ageNextYear field is computed. Just like a PASWA binary object, it starts out as a value (the result of an arithmetic operation). The data model says that the value itself can be accessed (probably an integer, on which more arithmetic could be done) or a string form can be retrieved. The conversion is specified on a per-type basis, but is typically the XML schema canonical form. It should be noted that the XML query model handles much more than XML documents, so there would be some questions to settle in reconciling it with PASWA. The query model can handle sequences of nodes, free-standing nodes such as attributes, etc. In any case, I think that the proposed PASWA model is quite close to a particular application of the Query data model. Indeed, in the proposals below I deal with this be stating that we only care about Data Models that correspond to legal SOAP Envelope Infosets. Some things I learned thinking about the two models =================================================== Considering PASWA and the Query model together reminds me that we need to be very careful and explicit about the contract between communicating PASWA nodes. In particular, we preseume that a sending node has (in the interesting case) local knowledge that the character children of some particular element are in fact the canonical form of the schema type such as base64Binary. Some questions: * Is it inherent in PASWA that every binding must communicate that typing information to the next hop? Thinking about the query model reminded me of this question, as the model has an explict dm:type accessor. Our options seem to be: - Yes in all cases: this makes it easy for successive hops to preserve the optimization, but it's not immediately obvious how an older binding would pass along this hint. - No, not ever: Works with any binding, but seems to require each receiving node to guess and then verify using unspecified means which elements are subject to optimization. - Binding dependent: lets smart bindings do the right thing, but has inconsistent semantics when traversing new and old bindings. * Should the PASWA optimization be limited to one or two specific types such as base64Bianry? - Yes: this meets the most immediate need. - No: there will be lots of cases in which values such as integers and floats will be available at the sending node, and a PASWA like technique could enable the efficient transmission of all of them. If we go far enough, we could carry query results along with full typing. Formulating PASWA in terms of the XML Query Model ================================================= I haven't gotten very far with this, but my prelinary noodling suggests the formulations might be quite clean, indeed perhaps cleaner than the original PASWA. Here are two very rough variants, corresponding to the cases where we do and don't want to send type information along with the message (a choice we have to make anyway). These proposals deal only with the actual message model and processing. All the other MIME typing and module proposal stuff can be lifted more or less intact from PASWA. Variant #1: No transmitted type information -------------------------------------------- This feature provides for: * modeling of SOAP messages in terms of the XML Query 1.0 XPath 2.0 data model (hereinafter the "Data Model"). * the implementation of bindings that use type information from the model to optimize the transmission of the SOAP message. According to this proposal, SOAP messages are modeled according to the Data Model. As provided in the specification for the Data Model there is corresponding to each such data model instance an XML Infoset (see [6,7,8,9] and similar sections). Per the usual rules for SOAP, the SOAP Envelope consists of that Infoset, and all such Infosets MUST conform to the requirements of the SOAP 1.2 Recommendation; any Data Model instance which corresponds to a non-conforming Infoset is not supported by this feature. Bindings MAY be specified to use additional information from the Data Model to optimize the transmission of SOAP messages. For example, if the dm:type of an element node is determined to be base64Binary, if its dm:string-value is known to be in canonical form, and if the dm:typed-value of that element is available efficiently, then a binding can be constructed to send (an efficient representation of) the typed-value. Although the binding may, for its own purposes, transmit information such as the dm:type, such information is in general restricted to the binding. The received SOAP message consists of the Infoset corresponding to the transmitted data model, as augmented by any features other than this one, and thus contains the string-value of the transmitted item. Similar optimizations are possible with types other than base64Binary, and may be implemented at the discretion of the particular binding. Variant #2: With type information ---------------------------------- This feature provides for: * modeling of SOAP messages in terms of the XML Query 1.0 XPath 2.0 data model (hereinafter the "Data Model"). * the transmission of non-SOAP information from the Data Model, specifically including the dm:type (from which the association between dm:typed-values and dm:string-values can invariably be determined). * the implementation of bindings that use type information from the model to optimize the transmission of the SOAP message. According to this proposal, SOAP messages are modeled according to the Data Model. As provided in the specification for the Data Model there is corresponding to each such data model instance an XML Infoset (see [6,7,8,9] and similar sections). Per the usual rules for SOAP, the SOAP Envelope consists of that Infoset, and all such Infosets MUST conform to the requirements of the SOAP 1.2 Recommendation. This feature mandates the transmission of additional information from the data model. Specifically, the dm-type of each element and attribute MUST be transmitted from node to node. This feature further provides that when a typed element or attribute is relayed intact by a SOAP intermediary, the dm:type MUST be relayed (we need rules here if the 2nd hop node doesn't support the feature.) (Note that bindings can optimize the common case where the nodes are not, in fact, typed. Since the data model is just an abstraction, we can assume that nodes will not claim to have typed any elements or attributes for which the overhead of sending the type information is prohibitive.) Bindings MAY be specified to use additional information from the data model to optimize the transmission of SOAP messages. For example, if the dm:type of an element node is determined to be base64Binary, if its dm:string-value is known to be in canonical form, and if the dm:typed-value of that element is available efficiently, then a binding can be constructed to send (an efficient representation of) the typed-value. Similar optimizations are possible with types other than base64Binary, and may be implemented at the discretion of the particular binding. Conclusion ========== That's roughly it. I'm intrigued by how simple the presentations appear to be. I'm sure they are very rough around the edges, perhaps even seriously buggy, but I think there's some room for optimism that it might all work out. If nothing else, I think that by adopting the terminology of the data model we might bave the option to make our presentation clearer and more precise. There is the hope, at least, that we can actually lay the groundwork for useful synergy between XML Query, XSL, XPath and SOAP. I'm curious what you all think? Noah [1] http://www.gotdotnet.com/team/jeffsch/paswa/paswa61.html [2] http://www.w3.org/TR/xpath-datamodel/ [3] http://www.w3.org/TR/xpath-datamodel/#dm-type [4] http://www.w3.org/TR/xpath-datamodel/#dm-string-value [5] http://www.w3.org/TR/xpath-datamodel/#dm-typed-value [6] http://www.w3.org/TR/xpath-datamodel/#DocumentNodeDM2IS [7] http://www.w3.org/TR/xpath-datamodel/#ElementNodeDM2IS [8] http://www.w3.org/TR/xpath-datamodel/#AttributeNodeDM2IS [9] http://www.w3.org/TR/xpath-datamodel/#NamespaceNodeDM2IS ------------------------------------------------------------------ Noah Mendelsohn Voice: 1-617-693-4036 IBM Corporation Fax: 1-617-693-8676 One Rogers Street Cambridge, MA 02142 ------------------------------------------------------------------
Received on Monday, 9 June 2003 19:17:16 UTC