W3C home > Mailing lists > Public > xml-dist-app@w3.org > April 2005

Re: soap:body and media types (fwd)

From: <noah_mendelsohn@us.ibm.com>
Date: Wed, 27 Apr 2005 16:49:39 -0400
To: Mark Baker <distobj@acm.org>
Cc: xml-dist-app@w3.org
Message-ID: <OFE230C16B.00B13B49-ON85256FF0.00716B32-85256FF0.00726942@lotus.com>


This is very helpful, thank you.  I'm tempted to ask why SOAP is any more 
broken or at risk than XML itself? 

In the case of XML, we have a media type "application/xml".  That tells 
you to expect perhaps an XML declaration, perhaps an internal subset, and 
then a root element the nature of which is completely unknown.   How are 
these disambiguated in practice?  How do I recognize a purchase order from 
an invoice?  Answer:  if the root is namespace qualified you infer from 
the QName.  You might reasonably say that I could also register 
application/purchaseOrder+xml, and that would indeed add additional out of 
stream information, but surely the creation of such a media type is 
optional.  There is surely no rule that a new media type is to be 
registered for each XML root element QName.

Now consider SOAP documents, such as those exchanged by the SOAP HTTP 
binding.  The binding serializes the Infoset to a stream of type 
application/soap+xml, which is a specialization of application xml.  The 
specialization tells you some additional things, such as that the root 
type will be soap:envelope.  Crucially, it also tells you that there will 
be a body containing an XML element.  I fail to see how you know any more 
or less about the body element than you generally do about the root of an 
application/xml stream.  In both cases, you know that it's an XML element 
and that you need to recognize the QName to infer its type.

SOAP looks to me no more or less broken than XML itself.   In both cases, 
you need to look at the QName to see what's going on.  For 
application/xml, it's the root QName, for application/soap+xml it's the 
body child element.

By the way, if media types were enhanced to get rid of the 1-level + sign 
kludge, then you might do something like registering: 
application/purchaseorder+soap+xml, the same option you have at the root 
level today.

Also, I think one can make the case that for each application/*+xml media 
type there should be a dual for the typing of Infosets.  The media type 
covers the serialized form, and the dual types the corresponding infoset. 
Whether the duals are given separate names, or whether the use of the 
media type name is licensed for both I'm not sure I care.  I do think that 
the two types are different, as one is streams and one in general isn't.

Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142

Mark Baker <distobj@acm.org>
04/27/2005 03:34 PM

        To:     noah_mendelsohn@us.ibm.com
        cc:     xml-dist-app@w3.org
        Subject:        Re: soap:body and media types (fwd)

Hi Noah,

On Mon, Apr 25, 2005 at 04:09:03PM -0400, noah_mendelsohn@us.ibm.com 
> Mark Baker writes:
> > Something Noah Mendelsohn said at the technical
> > plenary week about SOAP & media types, made me
> > realize that the SOAP envelope currently has a
> > problem; that it cannot communicate the media type
> > of the document encapsulated within the SOAP body.
> There are some subtleties here, I think, some of which were obliquely 
> touched upon at the TAG F2F in Boston.   As far as I know, media types 
> apply to octet streams.  SOAP envelopes are not in general octet 
> but are instead Infosets.

Fair enough.  I've never really bought the whole Infoset thing, but I
respect that SOAP is defined in those terms.

>  The content  of the body is an Element Info 
> Item.  Consider, for example, an implementation that uses SOAP for 
> communication between processes on a single machine.  It would be quite 
> reasonable to have a SOAP implementation that communicates using DOM or 
> SAX, without ever serializing to an octet stream.

Depends what you mean by "octet stream", I guess.  I just think of it
as the message payload, and, at least by my definition of "message",
messages are still exchanged in-process, including in your example.  Of
course, the information normally conveyed by a media type need not be
conveyed explicitly in that message, but may instead be established out
of band.  For example, imagine that we copy RFC 2616, except we mandate
that the Content-Type header must always be text/html.  If we then
register port 55555 with IANA, and associate it with this new spec, then
we know that all messages on port 55555 are declaring that their
payloads have text/html semantics.

So the media type needn't be a part of every exchange.  But in the
absence of any out of band information, I think it's needed, otherwise
you have a loss of self-description and resulting ambiguity.  I'm not
sure about the intricacies of EIIs, but if you mean effectively an
Infoset with a single EII, then I could well imagine exchange
scenarios involving no out of band information and therefore the need
a data semantic indication mechanism like a media type.

>  It is true that SOAP 
> envelopes as serialized by the normal HTTP binding are octet streams, 
> typically of media type application/soap+xml.  As I recall you are not a 

> particular fan of protocol independence, Mark, but SOAP has it, and SOAP 

> envelopes are Infosets.

SOAP "has" protocol independence in that it supports multiple underlying
protocols, and I'm very supportive of that.  FWIW, I'm just not
supportive of the kind of protocol independence where developers are
isolated from the semantics of underlying application protocols;
something SOAP doesn't, and shouldn't, say much about.

Hmm, I'm not sure that was relevant to my point, but oh well.

> Thus, I think there are at least two questions implicitly raised by your 

> note:
> 1. Is it appropriate to apply a media type to something other than an 
> octet stream,  e.g. to an element information item?    I have considered 

> raising this as  a TAG issue, but it seems to me that it is not in any 
> case appropriately a decision for the XMLP WG.


> 2. I suspect the answer at the moment is "no", but let's assume for sake 

> of discussion it's actually "yes":  then we can ask whether the subtrees 

> carried within SOAP bodies in particular should be media typed?  Note 
> that, in part due to limitations of XML itself, these are not in general 

> XML documents.  They cannot have their own XML declarations, internal 
> subsets, etc.  They are XML fragments, or more specifically element info 

> items.  Furthermore, it's not clear to me that there is an obligation to 

> carry the media type even if there were one. 

As above, I think there should be an obligation to use a media type when
there's no out of band mechanism to accomplish the task.  To not do so
would make it impossible to distinguish, for example, between a SOAP
message carrying an XHTML document, and one carrying a shortform XSLT
stylesheet, since they'd be bytewise identical;


> I think the main architectural question is #1.  If that is resolved in 
> favor of typing infoset subtrees, then it would be straightforward to 
> define a SOAP header that would be usable to carry the type.

I think both are important.  But even if the answer to #1 was "No", it
still seems to me that XML SOAP messages transferred via application
protocols which provide no out of band indication of the semantics of
the data (e.g. SMTP, HTTP, but not FTP), should use a media type.

Hmm, I see I repeated my main point a couple of times.  Sorry, that's
what I get for writing a message over several sessions! 8-/

Mark Baker.   Ottawa, Ontario, CANADA.        http://www.markbaker.ca
Received on Wednesday, 27 April 2005 20:49:49 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 22:01:27 UTC