RE: Data model task force recommends adoption of data model formulation from noah_mendelsohn@us.ibm.com on 2003-09-15 (xml-dist-app@w3.org from September 2003)

From: <noah_mendelsohn@us.ibm.com>
Date: Mon, 15 Sep 2003 15:43:42 -0400
To: "Martin Gudgin" <mgudgin@microsoft.com>
Cc: xml-dist-app@w3.org
Message-ID: <OF3708B222.7FB1041A-ON85256DA2.0069C83D@lotus.com>
Gudge:  thanks for your comments.  First, I think it's important to note 
that the differences between the formulations are questions of 
presentation, not function.  I believe that the original and the DM are 
equivalent in the systems that they let you build, the way in which you 
build them, the bindings you can support, etc.  Whenever there's a 
statement "you can do this with formultion I", it should also be true for 
D, and vice versa.  It's in that sense a matter of taste, in my opinion, 
which provides the more useful framework for discussing the function and 
for relating it to other specifications with which MTOM may be used.

Martin Gudgin writes:

> 
> I've read the MTOM data model draft and have the following comments:
> 
> General
> 
> 1.             While I appreciate the goodness behind referencing other 
W3C
> specifications, I don't think referencing the most complex spec that
> will do the job is a good thing. Specifically, I think that the Infoset
> spec is adequate for describing MTOM and that the XPath 2.0/XQuery 1.0
> Data Model spec is overkill.

FWIW, I don't think the tradeoff is that clear one way or the other.  I 
don't find the Data Model spec to be particularly more complex than 
Infoset.  I find it well-presented, etc.  The principle addition relative 
to Infoset is to deal with typing information, which is what we use as the 
basis for our optimizations in any case.  To me, the principle advantage 
of the Infoset approach is that SOAP happens to use it.  The advantage of 
DM is it's use of evolving W3C mechanisms for dealing with typed Infosets. 
 I respect your judgement call that the Infoset formulation seems more 
appealing to you. 
 
> 2.             The current draft would lead me to believe that I am 
required to
> construct an XPath 2.0/Xquery 1.0 Data Model for my SOAP envelope (
> albeit one with several properties set to 'default' values ). I do not
> think this is a good thing ( see point 1 ).

I'm not sure what this means.  Does the SOAP Rec require you to 
"construct" an Infoset?  I think the answer is no, as it is for DM and 
MTOM.  Both Infoset and DM tell you information that you must have 
available to proceed.  For the most part, the information required for use 
with SOAP is identical, regardless of whether you describe it in DM terms 
or Infoset terms.  You need the name of the root element, you need to know 
its children, etc. You need this not in some particular DM or Infoset 
form, but in whatever form your implementation happens to use for 
processing and eventually putting on the wire.  I believe I could take an 
existing, deployed SOAP 1.2 implementation and describe it in DM terms, 
though in all cases the dm:type would be indeterminate and and the 
dm:typed-value would be a string with the same chars as the 
dm:string-value.

The models just provide a vocabulary for discussing the specifications and 
the implementations.  As far as I know, both the MTOM and DM formulations 
require you to do exactly the same thing in your implementation. Neither 
requires you to "construct" a model, except insofar as they require you to 
have certain information.  That information is the same in both cases, 
IMO.
 
> 3.             If I were to come at this document with no prior 
knowledge of
> where the spec came from, I could easily go away with the impression
> that I have to have base64 strings around on the sender side before
> transmission. I think we need language that makes it very clear to
> people that they can start and finish with the binary form.

If that's a problem, we should fix it, as we should if the equivalent 
misunderstanding were found in the Infoset formulation.
 
> Specific:
> 
> 4.             The Introduction cites several reasons for wanting to 
described
> things it terms of the XPath 2.0/Xquery 1.0 Data Model;
> 
>                a) facilitating transmission of optimized queries
>                b) foundation for digitial signatures canonicalizations 
based on
> the XPath 2.0/Xquery 1.0 Data Model
>                c) foundation for optimization of a fully-typed XPath 
2.0/Xquery
> 1.0 Data Model
> 
>                Regarding a) I fail to understand why describing MTOM in 
terms
> of the XPath 2.0/Xquery 1.0 Data Model is necessary in order for MTOM to
> transmit optimized queries. That feels a bit like saying that given a
> method that takes a string as a parameter has a French name then french
> language strings cannot be passed to that method.

Not necessary.  It just makes the discussion easier.  Using Infoset wasn't 
necessary for SOAP either...we could have just described the info you 
needed.  It would have made it a bit more inconvenient to explain how some 
other specifications worked with SOAP, but it would otherwise have worked 
fine.  Same here.
 
>                Regarding b), I think it is unlikely that such 
canonicalizations
> will be developed anytime soon. And even if they were, I don't see why
> an Infoset description would be any less able to take advantage of such
> canonicalizations.

Well, there have been proposals independent of DM to do signatures that 
handle binary MTOM more efficiently.  Don't know if it will happen and 
when, but if it does the DM formulation offers a bit of a base on which to 
consider sharing such c14ns with the query world.
 
>                Regarding c), I think this is a very dangerous direction 
to go
> in. The original intent of PASWA was to provide an approach to
> serialization of raw binary data that still allows SOAP to see an XML
> Infoset. It was not intended to be the first step down the road to a
> binary version of XML. Given that the W3C is investigating binary XML in
> other fora, I do not think desire for tranmission of a fully type
> optimized Infoset is a valid reason for describing MTOM in terms of the
> XPath 2.0/Xquery 1.0 Data Model.

I can see this either way.  I think that there will at times be a need to 
send full query results.  That's definitely beyond the use cases and 
requirements for MTOM, and I wouldn't justify the DM approach primarily on 
that basis.  I do think it is an attractive byproduct of the DM approach, 
and unlike you I would not shy away from it.

>  5.            The draft states in several places that base64 encoded 
strins
> must be in the canonical form. What is the reason for this requirement (
> I realise this comment my not be directly related to the XPath
> 2.0/Xquery 1.0 Data Model )

Because base64 as specified by schema allows variability in whitespace. If 
you send only the binary on the wire, then you need some convention for 
which characters you reconstruct at the receiver.  I had presumed that we 
would want to focus on the canonical lexical representation as described 
in the schema errata linked from the DM formulation of MTOM draft.  If we 
prefer, we could use a non-canonical form (e.g. no whitespace), but I 
think we have to pick exactly one lexical for each value.  If a sender 
string is in some other base64 form, I think we can't optimize it.  As you 
say, this is completely independent of DM formulation issues.
 
> 6.             I do not find the descriptions in Sections 3.2.1 and 
3.2.2 to be
> particularly helpful. I think that the number of people who will
> understand the data model spec is less than the number who will/already
> do understand the Infoset spec. 

Will that be true in a year or two?  I think that Query will be a big 
deal, and to the extent that either is understood (limited :-) ) both will 
be reasonably widely understood.  Frankly, I think that W3C needs to bring 
all these models together at one point, but that's beyond the scope of 
current work.

> Describing things using Infoset terms is
> consistent with SOAP 1.2 

Absolultely.

> and results in no loss of descriptive power for
> our purposes. ( This is largely the same as point 1 )

As noted in the Intro, I think it's a question of style, not function.  If 
you want to explain how a query result will get sent through MTOM, the 
answer in the DM formulation is:  "it's lossy, but otherwise it just 
works".  If you want to explain what data is do be transmitted by an 
optimizing binding, in the DM forumation you say:  "transmitting the value 
of dm:typed-value is sufficient to allow reconstruction of the 
dm:string-value."  In the case of Infoset, you need to say that the 
binding can use a representation that conveys the value from the value 
space corresponding to the lexical form of the character children.  Both 
formulations work. 

Again: I think it's a matter of taste.  Obviously, at this point at least, 
you are unconvinced that the DM forumation is more appealing.  Fair 
enough, that's why we should be having this discussion.  If the WG agrees 
to Infoset life will go on, and all the same systems can be built. Whether 
we've missed an oppotuntity to make our specs cleaner and more synergistic 
is a question on which reasonable people can disagree (and for that 
matter, I can too.)
 
> Regards
> 
> Martin Gudgin
> 
>

And to you!

Noah
 
> 
> 
> > -----Original Message-----
> > From: xml-dist-app-request@w3.org 
> > [mailto:xml-dist-app-request@w3.org] On Behalf Of 
> > noah_mendelsohn@us.ibm.com
> > Sent: 13 September 2003 15:49
> > To: xml-dist-app@w3.org
> > Subject: Data model task force recommends adoption of data 
> > model formulation
> > 
> > 
> > 
> > 
> > 
> > 
> > On Friday Sept. 12th, the data model task force held a 
> > teleconference during which we considered the draft 
> > reformulation [1]* of MTOM based on the new XPath XQuery data 
> > model [2].  During the call, the task force unanimously 
> > agreed on the following recommendations to the XML Protocols
> > Workgroup:
> > 
> > * The draft at [1] is ready for consideration by the entire 
> > XML Protocols Workgroup.
> > 
> > * The DM formulation, as presented in [1], should be adopted 
> > as the basis for future work on MTOM (though this is an 
> > initial draft and will require cleanup and editing).
> > 
> > It should be noted that only three members of the task force 
> > were present for the call on the 12th, and while the above is 
> > their unanimous agreement, previous calls have had broader attendance.
> > 
> > The task force also considered concerns raised by Ugo Corda 
> > at [3], and decided that the response at [4] represents the 
> > consensus of the task force.  In quick outline, Ugo's concern 
> > is that, in the interests of sticking to the established 
> > scope of the existing MTOM design, and specifically in 
> > allowing MTOM messages to be sent through the existing SOAP 
> > HTTP binding, the data model formulation is presented as 
> > lossy.  Although type information from the data model is used 
> > as a hint by bindings to optimize SOAP transmission, such 
> > type information is not in general transmitted.  I believe 
> > Ugo's concern is that if the data model is used all, it 
> > should be transmitted faithfully.  This concern presents a 
> > Catch-22 for those interested in the data model formulation: 
> > the XMLP WG has already agreed, at least tentatively, that 
> > regardless of how the specification is modeled, type 
> > information is not necessarily to be transmitted.  The task 
> > force believes that on balance, the benefits of using 
> > terminology that is on its way to Recommendation status, and 
> > indeed doing so in way that might provide a basis for future 
> > specifications that would indeed transmit the full data model 
> > faithfully, outweigh any negatives resulting from the lossy 
> > use of the model in MTOM at this time.
> > 
> > Thus, we recommend consideration and adoption of the draft at 
> > [1] as the basis for future work on MTOM.
> > 
> > At least one set of details remains to be resolved if the DM 
> > formulation is to be used: the current draft does not discuss 
> > all of the accessors provided by the data model.  For 
> > example, element nodes [5] provide a base-uri [6], which in 
> > principle can vary for each element.  Future versions of the 
> > draft would need to explain that, like type information, such 
> > base URI and similar information is not transmitted.  This 
> > limitation is consistent with the general philosophy that 
> > MTOM will transform the input data model to a different (but 
> > predictably different) output data model at the receiver.  In 
> > general, the transmission will exactly preserve certain 
> > information, will lose other information such as base URI and 
> > type, and will not add or synthesize other information, 
> > except as directly follows from the losses (e.g. typed values 
> > change in the obvious way when type information is lost.)
> > 
> > Thank you very much.
> > 
> > Noah
> > 
> > * The draft linked from [1] is incorrectly formatted as a 
> > full WD.  The current copies at [7,8] correctly show editors' 
> > copy status, but are otherwise identical.  I referred to [1] 
> > im the note above, as it has the original submission text to the WG.
> > 
> > [1] http://lists.w3.org/Archives/Public/xml-dist-app/2003Aug/0014.html
> > [2] http://www.w3.org/TR/xpath-datamodel/
> > [3] http://lists.w3.org/Archives/Public/xml-dist-app/2003Aug/0018.html
> > [4] http://lists.w3.org/Archives/Public/xml-dist-app/2003Sep/0007.html
> > [5] http://www.w3.org/TR/xpath-datamodel/#ElementNode
> > [6] http://www.w3.org/TR/xpath-datamodel/#dm-base-uri
> > [7]
> > http://www.w3.org/2000/xp/Group/3/06/Attachments/OptimizationM
> > echanismDM.xml
> > [8]
> > http://www.w3.org/2000/xp/Group/3/06/Attachments/OptimizationM
> > echanismDM.html
> > 
> > 
> > 
> > 
> > ------------------------------------------------------------------
> > Noah Mendelsohn                              Voice: 1-617-693-4036
> > IBM Corporation                                Fax: 1-617-693-8676
> > One Rogers Street
> > Cambridge, MA 02142
> > ------------------------------------------------------------------
> > 
> > 
> > 
> 


------------------------------------------------------------------
Noah Mendelsohn                              Voice: 1-617-693-4036
IBM Corporation                                Fax: 1-617-693-8676
One Rogers Street
Cambridge, MA 02142
------------------------------------------------------------------
Received on Monday, 15 September 2003 15:44:07 UTC