Re: soap:body and media types (fwd) from Mark Baker on 2005-05-11 (xml-dist-app@w3.org from May 2005)

From: Mark Baker <distobj@acm.org>
Date: Wed, 11 May 2005 10:43:14 -0400
To: noah_mendelsohn@us.ibm.com
Cc: xml-dist-app@w3.org
Message-ID: <20050511144314.GE3412@markbaker.ca>
Hi Noah.  Thanks for asking the hard questions. 8-)  Sorry for the
delay, I was on vacation.

I think the disagreement here might be that you seem to be making an
assumption which I'm not, and which isn't, AFAICT, licensed by any
specification in the stack.  That's not to say that such an assumption
can't be used to build successful XML based systems (indeed, it's
already in use), only that I think it makes doing so more difficult.

The assumption I'm speaking of sometimes goes by the name "namespace
dispatching", and it involves using the namespace of the root element
of an XML document to determine the specification by which the contents
of the document are interpreted.  Consider this document;

<Person xmlns="http://example.org/foofoo/">
  <name>Mark Smith</name>
  <age>55</age>
</Person>

Using namespace dispatching, one would interpret the semantics of
that document per those prescribed in the specification associated with
the namespace identified by the URI "http://example.org/foofoo/".

The other approach is traditional media type dispatch.  Using it, the
message need only indicate the (say) "application/person+xml" media type
in order to know which specification to use to interpret the document.

I claimed above that there's no specification which prescribes namespace
dispatch.  That claim was based on a paragraph of RFC 3023 that I helped
edit, FWIW 8-);

  An XML document labeled as text/xml or application/xml might contain
  namespace declarations, stylesheet-linking processing instructions
  (PIs), schema information, or other declarations that might be used to
  suggest how the document is to be processed. For example, a document
  might have the XHTML namespace and a reference to a CSS stylesheet.
  Such a document might be handled by applications that would use this
  information to dispatch the document for appropriate processing.

Emphasis on "might".

There are several problems I see with namespace dispatching which are
a direct result of it being an intrinsic rather than an extrinsic
mechanism (aka unlayered) like media types.  Amoungst these problems is
that it prevents that sample document above from being interpreted
using other semantics, such as RDF (in fact, it is a valid RDF/XML
document).  Another example is the shortform XSLT stylesheet notation
using literal result elements;

  http://www.w3.org/TR/xslt#result-element-stylesheet

It also creates visibility and security problems, since, for example, an
HTML document may include Javascript which can rewrite its root element
after passing a firewall, where the new root element can prescribe
processing semantics which the firewall might be configured to disallow
(by banning certain media types).

There's also evolvability problems, as now the namespace is being asked
to play the role of a coarser grained data element, the media type.

Plus, there's performance problems, as the media type is readily
available in plain text form, early in the message, while the namespace,
being in the body of the message, might be compressed, encrypted, or
otherwise transformed, delaying the time at which the processing
application can be activated.

And FWIW, if I haven't pointed you to them already, you might be
interested in these two links, the first of which is a presentation I
gave to the Compound Document Formats WG earlier this year on the
subject as it relates to comp docs.  The second is a partial
examination of XML based dispatching behaviour in common Web browsers.

http://www.markbaker.ca/Talks/2004-media-types-and-compdocs/slide1-0.html
http://www.markbaker.ca/2004/01/XmlNamespaceDispatchTest/

Cheers,

Mark.

On Wed, Apr 27, 2005 at 04:49:39PM -0400, noah_mendelsohn@us.ibm.com wrote:
> Mark,
> 
> This is very helpful, thank you.  I'm tempted to ask why SOAP is any more 
> broken or at risk than XML itself? 
> 
> In the case of XML, we have a media type "application/xml".  That tells 
> you to expect perhaps an XML declaration, perhaps an internal subset, and 
> then a root element the nature of which is completely unknown.   How are 
> these disambiguated in practice?  How do I recognize a purchase order from 
> an invoice?  Answer:  if the root is namespace qualified you infer from 
> the QName.  You might reasonably say that I could also register 
> application/purchaseOrder+xml, and that would indeed add additional out of 
> stream information, but surely the creation of such a media type is 
> optional.  There is surely no rule that a new media type is to be 
> registered for each XML root element QName.
> 
> Now consider SOAP documents, such as those exchanged by the SOAP HTTP 
> binding.  The binding serializes the Infoset to a stream of type 
> application/soap+xml, which is a specialization of application xml.  The 
> specialization tells you some additional things, such as that the root 
> type will be soap:envelope.  Crucially, it also tells you that there will 
> be a body containing an XML element.  I fail to see how you know any more 
> or less about the body element than you generally do about the root of an 
> application/xml stream.  In both cases, you know that it's an XML element 
> and that you need to recognize the QName to infer its type.
> 
> SOAP looks to me no more or less broken than XML itself.   In both cases, 
> you need to look at the QName to see what's going on.  For 
> application/xml, it's the root QName, for application/soap+xml it's the 
> body child element.
> 
> By the way, if media types were enhanced to get rid of the 1-level + sign 
> kludge, then you might do something like registering: 
> application/purchaseorder+soap+xml, the same option you have at the root 
> level today.
> 
> Also, I think one can make the case that for each application/*+xml media 
> type there should be a dual for the typing of Infosets.  The media type 
> covers the serialized form, and the dual types the corresponding infoset. 
> Whether the duals are given separate names, or whether the use of the 
> media type name is licensed for both I'm not sure I care.  I do think that 
> the two types are different, as one is streams and one in general isn't.

-- 
Mark Baker.  Ottawa, Ontario, CANADA.          http://www.markbaker.ca
Coactus; Web-inspired integration strategies   http://www.coactus.com
Received on Wednesday, 11 May 2005 14:43:05 UTC