Re: Proposal for multi-reference support in MTOM

To allay your first concern, I'd be open to recommending that the 
binding check for differences during serialization. Computing a low 
cost hash during serialization wouldn't add much overhead I suspect.

I agree that UUID/GUIDs may not be usable in every environment but note 
that RFC 2111 requires that "the Content-ID of a MIME body part is 
required to be globally unique" so the problem exists independent of 
MTOM usage. Simply reusing the content-id of the part as the attribute 
value would suffice for MTOM multi-reference support.

Regards,
Marc.

On 13 Nov 2003, at 17:27, noah_mendelsohn@us.ibm.com wrote:

>
> I continue to be bothered by two of the concerns that I raised on the
> telcon.
>
> * Concern #1:  The SOAP Recommendation says [1]:
>
> "As described in 5. SOAP Message Construct, each SOAP message is 
> specified
> as an XML infoset that consists of a document information item with
> exactly one child: the SOAP Envelope element information item. 
> Therefore,
> the minimum responsibility of a binding in transmitting a message is to
> specify the means by which the SOAP message infoset is transferred to 
> and
> reconstituted by the binding at the receiving SOAP node and to specify 
> the
> manner in which the transmission of the envelope is effected using the
> facilities of the underlying protocol."
>
> If I understand the proposal correctly, the bindings supporting the
> proposal below would tend to violate this rule when confronted with
> content along the lines of:
>
> <env:Envelope xmlns:env="..." xmlns:mtom="...">
>    <env:Body>
>      <app:Stuff xmlns:app="...">
>        <app:Thing1 mtom:ContentID="someURI">
>          some base64 text
>        </app:Thing1>
>        <app:Thing2 mtom:ContentID="someURI">
>          some OTHER base64 text
>        </app:Thing2>
>      </app:Stuff>
>    </env:Body>
> </env:Envelope>
>
> Granted, content of this sort specifically violates the intended use of
> the attribute.  Maybe we were wise to not allow for bindings to break 
> the
> usual rules in circumstances like this, or maybe we were shortsighted. 
> The
> fact is that I don't think the current recommendation licenses a 
> binding
> to change the content of an Infoset in transmission.
>
> Now, in the interest of full disclosure, I can see a bit of a
> counterargument, though I find it somewhat unpleasant:  you might claim
> that the binding implements some sort of feature that, like encryption 
> by
> an active intermediary, changes the Infoset.  Still, I think we've more
> clearly licensed active intermediaries than active links, and my 
> current
> reading is that the SOAP recommendation currently requires that content
> like that shown above be transmitted with full fidelity, or else that a
> binding reflect some sort of binding-specific error (and even then, I
> would argue that it's a pretty poorly spec'd binding that throws errors
> when confronted with certain perfectly good SOAP infosets.
>
> * Concern #2: keeping IDs distinct at intermediaries
>
> As I also mentioned on the phone, an intermediary wishing to optimize
> content would have to verify that any ID used for new content was 
> distinct
> from those already in use.  This seems to put a burden on 
> intermediaries
> to more carefully parse headers not targeted to them then might 
> otherwise
> be necessary.  In the body, there could also be issues with streaming, 
> as
> one does not know which IDs have been used until the end of the 
> envelope
> is seen.  I heard Marc suggest the use of GUIDs, and in certain
> environments GUIDs are practical.  Still, given the need for access to 
> the
> moral equivalent of an Ethernet MAC address as a seed for the GUID, I
> think that use of GUIDs is at best a compromise.
>
> Overall:  it seems wrong to me to have bindings that depend for
> correctness on the integrity of content in the Infoset.  MTOM optimizes
> based on content of infoset, but so far it never fails to send the 
> right
> thing.  At worst you'll make bad decisions about what to optimize.   
> That
> seems to me to be one of the key attractions of MTOM, and this proposal
> seems to come close to changing that.
>
> In the end, I remain ambivilent about this proposal.  The disadvantages
> include those listed above.  If we could convince ourselves that 
> 1-to-1 is
> OK at the binding level, then all these issues go away, along with
> reference counting or any other messy alternative.   I suppose it comes
> down to our use cases:  is it really important for us to include the 
> same
> large image or the like in two places in the same SOAP envelope 
> infoset?
> It's not clear to me that's necessary.  I lean somewhat toward what I 
> take
> to be Gudge's position:  if you need to share a template or some such,
> then either live with the overhead of duplicating it, or more likely, 
> use
> some sort of explicit ID/REF mechanism within the envelope to share a
> copy.
>
> Noah
>
> [1] http://www.w3.org/TR/soap12-part1#bindfw
>
> --------------------------------------
> Noah Mendelsohn
> IBM Corporation
> One Rogers Street
> Cambridge, MA 02142
> 1-617-693-4036
> --------------------------------------
>
>
>
>
>
>
>
>
> Marc Hadley <Marc.Hadley@Sun.COM>
> Sent by: xml-dist-app-request@w3.org
> 11/12/2003 03:47 PM
>
>
>         To:     "Xml-Dist-App@W3. Org" <xml-dist-app@w3.org>
>         cc:     (bcc: Noah Mendelsohn/Cambridge/IBM)
>         Subject:        Proposal for multi-reference support in MTOM
>
>
> Here's a proposal for an extension to the current MTOM formulation to
> offer better support for multiple inclusion of the same data. The
> proposed extension  has the following properties:
>
> - Preserves MTOM semantics of attachment inclusion in SOAP message
> infoset
> - Supports existing 'Include' and 'Representation' semantics, use of
> extension is optional
> - Supports multiple inclusion of attachments without replication of
> data in the serialized form
> - Multiply included data replicated in message infoset, signatures over
> elements containing such data include attachment data rather than a
> reference to the data as woud be the case when using a Representation
> approach.
>
>
> Infoset Form
> ============
>
> This section shows via an example the infoset of a message after the
> binding has performed the MTOM deserialization (described later). XML
> 1.0 is used as the most convenient syntax to express the infoset but
> this should be considered a purely abtract model of the message
> content.
>
> <env:Envelope xmlns:env="..." xmlns:mtom="...">
>    <env:Body>
>      <app:Stuff xmlns:app="...">
>        <app:Thing1 mtom:ContentID="someURI">
>          some base64 text
>        </app:Thing1>
>        <app:Thing2 mtom:ContentID="someURI">
>          some base64 text
>        </app:Thing2>
>        <app:Thing3>
>          some base64 text
>        </app:Thing3>
>      </app:Stuff>
>    </env:Body>
> </env:Envelope>
>
> Note that the same base64 data is included as the content of the Thing1
> and Thing2 EIIs, this is indicated by the value of the mtom:ContentID
> attribute being the same for both. Thing3 has no mtom:ContentID
> indicating that the optional multi-reference extension is not being
> used for the content of this EII.
>
>
> Optimized (MIME) Wire Form
> ==========================
>
> This section shows via an example the serialized form of a message
> using the MIME based MTOM.
>
> Content-type: multipart/related; boundary="someBoundaryString"
>
> --someBoundaryString
> Content-Type: application/soap+xml
>
> <env:Envelope xmlns:env="..." xmlns:mtom="...">
>    <env:Body>
>      <app:Stuff xmlns:app="...">
>        <app:Thing1 mtom:ContentID="someURI">
>          <mtom:Include href="someURI">
>          <!-- depending on how mtom:ContentID is defined, the
> Include/@href may be redundant -->
>        </app:Thing1>
>        <app:Thing2 mtom:ContentID="someURI">
>          <mtom:Include href="someURI">
>        </app:Thing2>
>        <app:Thing3>
>          <mtom:Include href="someOtherURI">
>        </app:Thing3>
>      </app:Stuff>
>    </env:Body>
> </env:Envelope>
>
> --someBoundaryString
> Content-Type: image/png
> Content-ID: someURI
>
> binary picture data
>
> --someBoundaryString
> Content-Type: image/png
> Content-ID: someOtherURI
>
> binary picture data
>
> --someBoundaryString--
>
>
> Schema Types
> ============
>
> <complexType name="OptimizationCandidate">
>    <simpleContent>
>      <extension base="xsd:base64Binary">
>        <attribute name="ContentID" type="xsd:anyURI"/>
>        <attribute name="MediaType" type="xsd:string"/>
>        <!-- other attributes we define -->
>      </extension>
>    </simpleContent>
> </complexType>
>
> Terminology
> ===========
>
> The following terminology is used in the description of the
> serialization and deserialization algorithms:
>
> Optimization candidate:
>    EII of type xsd:base64 or mtom:OptimizationCandidate.
>
> Matching MIME part:
>    MIME part whose content-id and/or content-location headers (TBD
> specify exact matching criteria) match an
> OptimizationCandidate/@ContentID.
>
> Content:
>    base64Binary child CIIs of an optimization candidate (excludes AII
> children)
>
>
> Infoset to Wire Serialization
> =============================
>
> For each optimization candidate in the SOAP message
>      - if no matching MIME part exists then create a matching MIME part
> from the optimization candidate's decoded content and AIIs
>      - replace the content of the optimization candidate with a child
> mtom:Include EII
>
>
> Wire to Infoset Deserialization
> ===============================
>
> For each mtom:Include EII
>      - replace the mtom:Include EII with base64 encoded attachment
> content
>
> --
> Marc Hadley <marc.hadley@sun.com>
> Web Technologies and Standards, Sun Microsystems.
>
>
>
--
Marc Hadley <marc.hadley@sun.com>
Web Technologies and Standards, Sun Microsystems.

Received on Thursday, 13 November 2003 20:01:30 UTC