Re: Proposal for multi-reference support in MTOM

I continue to be bothered by two of the concerns that I raised on the 
telcon. 

* Concern #1:  The SOAP Recommendation says [1]: 

"As described in 5. SOAP Message Construct, each SOAP message is specified 
as an XML infoset that consists of a document information item with 
exactly one child: the SOAP Envelope element information item. Therefore, 
the minimum responsibility of a binding in transmitting a message is to 
specify the means by which the SOAP message infoset is transferred to and 
reconstituted by the binding at the receiving SOAP node and to specify the 
manner in which the transmission of the envelope is effected using the 
facilities of the underlying protocol."

If I understand the proposal correctly, the bindings supporting the 
proposal below would tend to violate this rule when confronted with 
content along the lines of:

<env:Envelope xmlns:env="..." xmlns:mtom="...">
   <env:Body>
     <app:Stuff xmlns:app="...">
       <app:Thing1 mtom:ContentID="someURI">
         some base64 text
       </app:Thing1>
       <app:Thing2 mtom:ContentID="someURI">
         some OTHER base64 text
       </app:Thing2>
     </app:Stuff>
   </env:Body>
</env:Envelope>

Granted, content of this sort specifically violates the intended use of 
the attribute.  Maybe we were wise to not allow for bindings to break the 
usual rules in circumstances like this, or maybe we were shortsighted. The 
fact is that I don't think the current recommendation licenses a binding 
to change the content of an Infoset in transmission. 

Now, in the interest of full disclosure, I can see a bit of a 
counterargument, though I find it somewhat unpleasant:  you might claim 
that the binding implements some sort of feature that, like encryption by 
an active intermediary, changes the Infoset.  Still, I think we've more 
clearly licensed active intermediaries than active links, and my current 
reading is that the SOAP recommendation currently requires that content 
like that shown above be transmitted with full fidelity, or else that a 
binding reflect some sort of binding-specific error (and even then, I 
would argue that it's a pretty poorly spec'd binding that throws errors 
when confronted with certain perfectly good SOAP infosets.

* Concern #2: keeping IDs distinct at intermediaries

As I also mentioned on the phone, an intermediary wishing to optimize 
content would have to verify that any ID used for new content was distinct 
from those already in use.  This seems to put a burden on intermediaries 
to more carefully parse headers not targeted to them then might otherwise 
be necessary.  In the body, there could also be issues with streaming, as 
one does not know which IDs have been used until the end of the envelope 
is seen.  I heard Marc suggest the use of GUIDs, and in certain 
environments GUIDs are practical.  Still, given the need for access to the 
moral equivalent of an Ethernet MAC address as a seed for the GUID, I 
think that use of GUIDs is at best a compromise.

Overall:  it seems wrong to me to have bindings that depend for 
correctness on the integrity of content in the Infoset.  MTOM optimizes 
based on content of infoset, but so far it never fails to send the right 
thing.  At worst you'll make bad decisions about what to optimize.   That 
seems to me to be one of the key attractions of MTOM, and this proposal 
seems to come close to changing that.

In the end, I remain ambivilent about this proposal.  The disadvantages 
include those listed above.  If we could convince ourselves that 1-to-1 is 
OK at the binding level, then all these issues go away, along with 
reference counting or any other messy alternative.   I suppose it comes 
down to our use cases:  is it really important for us to include the same 
large image or the like in two places in the same SOAP envelope infoset? 
It's not clear to me that's necessary.  I lean somewhat toward what I take 
to be Gudge's position:  if you need to share a template or some such, 
then either live with the overhead of duplicating it, or more likely, use 
some sort of explicit ID/REF mechanism within the envelope to share a 
copy. 

Noah

[1] http://www.w3.org/TR/soap12-part1#bindfw

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------








Marc Hadley <Marc.Hadley@Sun.COM>
Sent by: xml-dist-app-request@w3.org
11/12/2003 03:47 PM

 
        To:     "Xml-Dist-App@W3. Org" <xml-dist-app@w3.org>
        cc:     (bcc: Noah Mendelsohn/Cambridge/IBM)
        Subject:        Proposal for multi-reference support in MTOM


Here's a proposal for an extension to the current MTOM formulation to 
offer better support for multiple inclusion of the same data. The 
proposed extension  has the following properties:

- Preserves MTOM semantics of attachment inclusion in SOAP message 
infoset
- Supports existing 'Include' and 'Representation' semantics, use of 
extension is optional
- Supports multiple inclusion of attachments without replication of 
data in the serialized form
- Multiply included data replicated in message infoset, signatures over 
elements containing such data include attachment data rather than a 
reference to the data as woud be the case when using a Representation 
approach.


Infoset Form
============

This section shows via an example the infoset of a message after the 
binding has performed the MTOM deserialization (described later). XML 
1.0 is used as the most convenient syntax to express the infoset but 
this should be considered a purely abtract model of the message 
content.

<env:Envelope xmlns:env="..." xmlns:mtom="...">
   <env:Body>
     <app:Stuff xmlns:app="...">
       <app:Thing1 mtom:ContentID="someURI">
         some base64 text
       </app:Thing1>
       <app:Thing2 mtom:ContentID="someURI">
         some base64 text
       </app:Thing2>
       <app:Thing3>
         some base64 text
       </app:Thing3>
     </app:Stuff>
   </env:Body>
</env:Envelope>

Note that the same base64 data is included as the content of the Thing1 
and Thing2 EIIs, this is indicated by the value of the mtom:ContentID 
attribute being the same for both. Thing3 has no mtom:ContentID 
indicating that the optional multi-reference extension is not being 
used for the content of this EII.


Optimized (MIME) Wire Form
==========================

This section shows via an example the serialized form of a message 
using the MIME based MTOM.

Content-type: multipart/related; boundary="someBoundaryString"

--someBoundaryString
Content-Type: application/soap+xml

<env:Envelope xmlns:env="..." xmlns:mtom="...">
   <env:Body>
     <app:Stuff xmlns:app="...">
       <app:Thing1 mtom:ContentID="someURI">
         <mtom:Include href="someURI">
         <!-- depending on how mtom:ContentID is defined, the 
Include/@href may be redundant -->
       </app:Thing1>
       <app:Thing2 mtom:ContentID="someURI">
         <mtom:Include href="someURI">
       </app:Thing2>
       <app:Thing3>
         <mtom:Include href="someOtherURI">
       </app:Thing3>
     </app:Stuff>
   </env:Body>
</env:Envelope>

--someBoundaryString
Content-Type: image/png
Content-ID: someURI

binary picture data

--someBoundaryString
Content-Type: image/png
Content-ID: someOtherURI

binary picture data

--someBoundaryString--


Schema Types
============

<complexType name="OptimizationCandidate">
   <simpleContent>
     <extension base="xsd:base64Binary">
       <attribute name="ContentID" type="xsd:anyURI"/>
       <attribute name="MediaType" type="xsd:string"/>
       <!-- other attributes we define -->
     </extension>
   </simpleContent>
</complexType>

Terminology
===========

The following terminology is used in the description of the 
serialization and deserialization algorithms:

Optimization candidate:
   EII of type xsd:base64 or mtom:OptimizationCandidate.

Matching MIME part:
   MIME part whose content-id and/or content-location headers (TBD 
specify exact matching criteria) match an 
OptimizationCandidate/@ContentID.

Content:
   base64Binary child CIIs of an optimization candidate (excludes AII 
children)


Infoset to Wire Serialization
=============================

For each optimization candidate in the SOAP message
     - if no matching MIME part exists then create a matching MIME part 
from the optimization candidate's decoded content and AIIs
     - replace the content of the optimization candidate with a child 
mtom:Include EII


Wire to Infoset Deserialization
===============================

For each mtom:Include EII
     - replace the mtom:Include EII with base64 encoded attachment 
content

--
Marc Hadley <marc.hadley@sun.com>
Web Technologies and Standards, Sun Microsystems.

Received on Thursday, 13 November 2003 17:29:26 UTC