Re: soap:body and media types - adjunct from Mark Baker on 2005-05-31 (xml-dist-app@w3.org from May 2005)

From: Mark Baker <distobj@acm.org>
Date: Tue, 31 May 2005 11:52:46 -0400
To: noah_mendelsohn@us.ibm.com
Cc: xml-dist-app@w3.org
Message-ID: <20050531155246.GG3412@markbaker.ca>
Here's the second message ...

On Thu, May 19, 2005 at 09:00:38AM -0400, noah_mendelsohn@us.ibm.com wrote:
> True, and I'd encourage this insofar as it applies, but media types 
> currently don't scale.  You can say this is application/soap+xml, but not 
> as far as I know application/purchaseOrder+soap+xml.  They certainly don't 
> scale to mixin semantics right now: 
> application/nonRepudiable&cacheable&purchaseOrder+soap+xml (where in this 
> example, the nonRepudiable and cacheable are not in a subtype relation. 
> They can be freely mixed and matched.)

I think media types scale just fine.  They just need to be used in the
context of a framework which provides mixin capabilities, as XML by
itself doesn't do that.  Some richer frameworks built on top of XML do
though.  For example, RDF/XML permits me to describe both purchase
orders and people using a single XML based media type.  Ditto for XTM,
the Topic Maps in XML effort.  And what we're building in the CDF WG
will do something similar.

> > I claimed above that there's no specification
> > which prescribes namespace dispatch.
> 
> There is in at least the case of SOAP headers.  Quoting from the 
> Recommendation [1]:
> 
> "A SOAP header block is said to be understood by a SOAP node if the 
> software at that SOAP node has been written to fully conform to and 
> implement the semantics specified for the XML expanded name of the 
> outer-most element information item of that header block."

Good point.

> The SOAP body is different,

Right.  It's the body I'm concerned about.

>and is parallel to the case of unwrapped 
> application/xml I think [2]:
> 
> "An ultimate SOAP receiver MUST correctly process the immediate children of 
> the SOAP body (see 5.3 SOAP Body). However, with the exception of SOAP 
> faults (see 5.4 SOAP Fault), Part 1 of this specification (this document) 
> mandates no particular structure or interpretation of these elements, and 
> provides no standard means for specifying the processing to be done."
> 
> So, in the case of faults, the root QName determines the interpretation 
> because the spec says it does.

Mostly agreed.  I'll avoid the "When is a fault a fault"[1] question for
purposes of expediency. 8-)

>For other bodies, there is no such 
> assumption.  As with complete XML documents, you know it's XML, but beyond 
> that the means used to decide on the significance are beyond the scope of 
> the XML or SOAP specs respectively.  Other specs may provide such 
> interpretations (e.g. the XHTML spec), and many of them do key on the root 
> QName.

IMO, that is the role currently played by media types, or more
specifically their use in other specifications such as HTTP.

> > There are several problems I see with namespace
> > dispatching which are a direct result of it being
> > an intrinsic rather than an extrinsic mechanism
> > (aka unlayered) like media types.  Amoungst these
> > problems is that it prevents that sample document
> > above from being interpreted using other
> > semantics, such as RDF (in fact, it is a valid
> > RDF/XML document).
> 
> I don't think I ever said that there should be only one legal way to 
> process any given document.

My bad.  What I meant to say above is that the message is unable to
convey certain kinds of semantics.  Consider this SOAP message;

POST some-uri HTTP/1.1
Host: some-host
Content-Type: application/soap+xml
[blank line]
<env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope"> 
<env:Body>
  <Person xmlns="http://personml.example.org/person/">
    <name>Mark Smith</name>
    <age>55</age>
  </Person>
</env:Body>
</env:Envelope>

Since there's no media type for the "Person" sub-document there,
there's no way to distinguish between the several possible
interpretations of the meaning of that message;

- that it's some plain text
- that it's some XML
- that it's some PersonML
- that it's some RDF/XML

So not only does SOAP not say which interpretation is licensed (i.e.
it doesn't require "namespace dispatching"), it doesn't provide any
mechanism for doing so (such as my implicitly suggested "env:mediaType"
attribute on env:Body).

>I do think it is reasonable to write 
> specifications for particular processors (SOAP processors, browsers, etc.) 
> that say:  "this processor will for its purposes key on the QName of the 
> (root) element to determine the processing to be done."  Absolutely it 
> should be possible for a different application to determine its mode of 
> processing on all manner of other available information including that 
> contained within the document (PIs?), outside the document (the media 
> type, the encoding, the length of the file, the date received), etc.
> 
> Just as the root QName is not the right answer in all cases, neither is 
> the media type.  I do think that the interpretation of a document should 
> seldom if even be in conflict with that suggested by the media type or by 
> the specifications for its content (e.g. the XHTML spec).  Thus, to 
> process an image/jpeg as an XML file is surely an error.

>To decide that a 
> given application/xml file is in particular a purchase order based on the 
> root element seems to me not an error, especially if someone has written a 
> specification saying that this is the proper interpretation of that QName. 

Which specification though?  It seems to me that it has to be RFC 3023,
which is purposefully ambiguous about such an interpretation.

> By all means where possible one should invent and use a more specialized 
> media type such as application/po+xml, but there are rather severe limits 
> to what can be captured in media types.  Furthermore, the facts that 
> QNames contain URIs and media types do not, and that QNames can be created 
> in a distributed manner, almost ensures that there will be cases where 
> QNames offer needed power that media types do not.
> 
> > Plus, there's performance problems, as the media
> > type is readily available in plain text form,
> > early in the message, while the namespace, being
> > in the body of the message, might be compressed,
> > encrypted, or otherwise transformed, delaying the
> > time at which the processing application can be
> > activated.
> 
> I think this is a red herring.  If you want to pull a fine grained 
> document type out into something like an HTTP header you can.

Sure, you could do that, and I agree it would address the issue (while
creating new ones, though).  But it would require new standards and an
accompanying roll out of those semantics.  Until that happens, I think
my performance problem claim is valid.

 [1] http://lists.w3.org/Archives/Public/xml-dist-app/2002Mar/0007.html

Mark.
-- 
Mark Baker.  Ottawa, Ontario, CANADA.          http://www.markbaker.ca
Coactus; Web-inspired integration strategies   http://www.coactus.com
Received on Tuesday, 31 May 2005 15:52:18 UTC