Re: Media types from Mark Baker on 2002-01-17 (www-tag@w3.org from January 2002)

From: Mark Baker <distobj@acm.org>
Date: Thu, 17 Jan 2002 13:12:18 -0500 (EST)
To: noah_mendelsohn@us.ibm.com (Noah Mendelsohn)
Cc: www-tag@w3.org (www-tag), xml-dist-app@w3.org (xml-dist-app)
Message-Id: <200201171812.NAA16512@markbaker.ca>
Sorry for the delay, Noah.

>  It's sort of like saying:  "this is XML" as 
> opposed to "this is XHTML".

That depends how "sort of like" you mean 8-).  I don't believe it's
close enough to warrant your conclusion below, because the SOAP example
uses a containment relationship, where no such structure exists with
a single XHTML document than can also be said to be XML.

>  Both are useful, but it seems appropriate to 
> have a MIME type for XHTML, as well as for XML (and I presume we use 
> xhtml+xml).   By the same reasoning, it's potentially useful to have MIME 
> types for specific applications of SOAP.
> 
> Indeed one could argue that with SOAP's current definition of body 
> processing, there is no generally applicable means of inspecting the 
> document or body and determining its purpose (in the sense of being a 
> Purchase Order).

It is my understanding that SOAPAction addresses this.

>  That's because in the new 1.2 draft [1] we allow an 
> arbitrary number of body element children, and we' don't say whether in 
> general they represent separate similar units of work (e.g. multiple 
> purchase orders), one unit of work (a purchase order + supporting data), 
> one unit of work modified by another (purchase order + improved purchase 
> order ehancements), multiple unrelated (purchase order + open new 
> account), etc.  So not even a general hierarchical name (e.g. 
> purchaseOrder+soap+xml) would handle all cases, but it would help for 
> many. 

This is an excellent point.  Permitting an arbitrary number of children
of "body" without defining the relationship between these children, is a
problem.  It's not being a very good container.

But I'm not sure the best solution is to create a compound name that
dispatches a monolithic processor.  It seems to me that the best
solution would be to make sure each layer defines clear containment
semantics.  This might be a bit idealistic in *all* cases, but should
we not attempt to make it the default choice?

> Overall:  I'm questioning whether it's necessary or practical to impose 
> any fixed relationship between the internal structure documents in 
> general, or even hierarchical formats such as XML,  and the corresponding 
> MIME type names.  In the case of XML vs particular XML vocabularies, RFC 
> 3023 [2] makes clear that there is no such requirement:
> 
>         "XML generic processing is not always 
>          appropriate for XML-based media types. 
>          For example, authors of some such media 
>          types may wish that the types remain 
>          entirely opaque except to applications 
>          that are specifically designed to deal 
>          with that media type.  By NOT following
>          the naming convention '+xml', such media 
>          types can avoid XML-generic processing."
> 
> Surely the same latitude should be available when using SOAP?  I think 
> it's the sender that knows the intention of the document, regardless of 
> its structure.

I agree.  That's a good reason why we shouldn't require that all SOAP
messages be described with the application/soap+xml type.  That is,
unless we shore up body containment semantics to say that what is
contained must be prepared to be processed as generic XML.  Is this
unreasonable?

>    If I label something with a MIME type, it should be 
> because I believe the document conforms to the specification for that MIME 
> type, which might or might not key primarily on the outermost element 
> (admittedly, it is likely to at least involve the outermost element). 

Except for XSLT. 8-)

> Making it easy to key on the root element or to involve the 
> root element in a hiearchical name is a good thing, because it's a common 
> idiom.

Right.

>  Requiring that MIME types be based only or primarily on the root 
> element (or any other single construct in the document), seems more 
> questionable. 

I don't think that's what's at issue here.  Media types can, and will
continue to be used this way.  As I just responded to Stuart, I think
the value is in trying to see how generic we can be for those that
want to use generic behaviour.

> I therefore propose: 
> 
> a) users be free to propose new MIME types of any structure for particular 
> sorts of SOAP documents (I.e. no requirement to use soap+xml or ...+xml). 
> This is the analog of the freedom accorded to those creating MIME types 
> for XML vocabularies.

Ok, but I'd suggest that there be a good reason for not using generic
behaviour.  i.e. new media types SHOULD NOT be used unless the generic
behaviour attributable to whatever generic media type we use (I've
mentioned the XSLT problem with */xml), is not appropriate.

> b) a recommendation to use soap+xml in the common case where the only 
> intention is to convey the "SOAPness" of the document.

We also have to decide what portion of this generic behaviour applies
to "+xml" types.  i.e. can we expect the same behaviour from a SOAP
message described as application/soap+xml as we can for a SOAP message
described with application/xml?

> c) maybe a suggestion that in cases where there is a particular use of 
> SOAP, or else uses that can be well modelled hierarchically, that a 
> convention such as purchaseOrder+soap+xml.... be used.  I don't see this 
> prohibited by RFC 3023, but this convention goes beyond SOAP, and so 
> should be debated first by those responsible for the MIME type RFCs.

See my answer to a).

> >> <not xmlns="foo">
> >>  <banana xmlns="bar">
> >> </not>
> 
> >> Is that a banana?
> 
> Well, it really depends on the specification that describes the document 
> as a whole.  If "foo:not" is defined to be a more or less transparent, 
> semantics-free envelope construct, then I would say this is (or might well 
> be) a banana.

Right.

>  Surely your intention was that the spec for "foo:not" in 
> fact conveys the semantic: "I am negating the definition of what I 
> contain".  So, even there, it's an interesting question whether this is 
> best described as a "not" document, or a "not banana" document. 

I've forgotten where I was going with that. 8-(  Perhaps it was that
there's no point using "banana" to refer to it at all unless you know
what the containment semantics of all the containing elements are.

If you do know, then you are free to make up a custom media type, but
why bother if generic dispatch rules from a generic media type yield
the same behaviour?

MB
-- 
Mark Baker, Chief Science Officer, Planetfred, Inc.
Ottawa, Ontario, CANADA.      mbaker@planetfred.com
http://www.markbaker.ca   http://www.planetfred.com
Received on Thursday, 17 January 2002 13:11:10 UTC