Re: Media types from Noah Mendelsohn on 2002-01-15 (www-tag@w3.org from January 2002)

From: Noah Mendelsohn <noah_mendelsohn@us.ibm.com>
Date: Tue, 15 Jan 2002 14:53:18 -0500
To: "Mark Baker <distobj" <distobj@acm.org>
Cc: "www-tag" <www-tag@w3.org>, "xml-dist-app" <xml-dist-app@w3.org>
Message-ID: <OFEA4C4305.22EE8E88-ON85256B42.0062D653@pok.ibm.com>
Mark Baker writes:

>> If you buy TimBL's argument that the container
>> specifies the meaning of containment (as I do), then
>> the root container specifies the root meaning and
>> that's the only place you can start in trying to
>> construct the "whole meaning".

I  buy this sometimes, but not always.  To some extent it's a matter of
degree:  even if the root is where you start, it may not tell you enough of
the whole meaning to be interesting.   See below.

>> To use your example, only because we know what the SOAP
>> envelope means, do we know that the body should be
>> processed as a purchase order (and that's assuming that
>> there are no unknown mustUnderstand declarations).

Yes, in XML the semantics typically follows the lexical hierarchy, and so
the outermost element gives you your "first stop" in figuring out with the
document means.  The problem is that, in the case of SOAP, the
specification is essentially delegating much of the designation of meaning
to a more deeply nested part of the document:  The <SOAP> element basically
just says, "I'll tell you something about a generic processing model that
can be used with this document, but I won't really tell you what this
document is."  It's sort of like saying:  "this is XML" as opposed to "this
is XHTML".  Both are useful, but it seems appropriate to have a MIME type
for XHTML, as well as for XML (and I presume we use xhtml+xml).   By the
same reasoning, it's potentially useful to have MIME types for specific
applications of SOAP.

Indeed one could argue that with SOAP's current definition of body
processing, there is no generally applicable means of inspecting the
document or body and determining its purpose (in the sense of being a
Purchase Order).  That's because in the new 1.2 draft [1] we allow an
arbitrary number of body element children, and we' don't say whether in
general they represent separate similar units of work (e.g. multiple
purchase orders), one unit of work (a purchase order + supporting data),
one unit of work modified by another (purchase order + improved purchase
order ehancements), multiple unrelated (purchase order + open new account),
etc.  So not even a general hierarchical name (e.g. purchaseOrder+soap+xml)
would handle all cases, but it would help for many.

Overall:  I'm questioning whether it's necessary or practical to impose any
fixed relationship between the internal structure documents in general, or
even hierarchical formats such as XML,  and the corresponding MIME type
names.  In the case of XML vs particular XML vocabularies, RFC 3023 [2]
makes clear that there is no such requirement:

      "XML generic processing is not always
       appropriate for XML-based media types.
       For example, authors of some such media
       types may wish that the types remain
       entirely opaque except to applications
       that are specifically designed to deal
       with that media type.  By NOT following
       the naming convention '+xml', such media
       types can avoid XML-generic processing."

Surely the same latitude should be available when using SOAP?  I think it's
the sender that knows the intention of the document, regardless of its
structure.    If I label something with a MIME type, it should be because I
believe the document conforms to the specification for that MIME type,
which might or might not key primarily on the outermost element
(admittedly, it is likely to at least involve the outermost element).

With due respect to Tim, why do we have to go further than that?  The fact
that XML is hierarchical is an accident from the point of view of MIME
types, I think, and not all users of XML do their heavy lifting in the root
element.  If I want to invent some "purchaseOrdrer" (or graphics, or web
page, or whatever) MIME type  that just happens to be wrapped in a
semi-transparent XML envelope like SOAP, is that a bad usage of MIME?    In
short, I think the hierarchical view of an XML document only goes so far
semantically.  Making it easy to key on the root element or to involve the
root element in a hiearchical name is a good thing, because it's a common
idiom.  Requiring that MIME types be based only or primarily on the root
element (or any other single construct in the document), seems more
questionable.

I therefore propose:

a) users be free to propose new MIME types of any structure for particular
sorts of SOAP documents (I.e. no requirement to use soap+xml or ...+xml).
This is the analog of the freedom accorded to those creating MIME types for
XML vocabularies.

b) a recommendation to use soap+xml in the common case where the only
intention is to convey the "SOAPness" of the document.

c) maybe a suggestion that in cases where there is a particular use of
SOAP, or else uses that can be well modelled hierarchically, that a
convention such as purchaseOrder+soap+xml.... be used.  I don't see this
prohibited by RFC 3023, but this convention goes beyond SOAP, and so should
be debated first by those responsible for the MIME type RFCs.

>> <not xmlns="foo">
>>  <banana xmlns="bar">
>> </not>

>> Is that a banana?

Well, it really depends on the specification that describes the document as
a whole.  If "foo:not" is defined to be a more or less transparent,
semantics-free envelope construct, then I would say this is (or might well
be) a banana.  Surely your intention was that the spec for "foo:not" in
fact conveys the semantic: "I am negating the definition of what I
contain".  So, even there, it's an interesting question whether this is
best described as a "not" document, or a "not banana" document.

Having said all that, I should admit that my experience with compound
documents in general is far deeper than my knowledge of MIME types and
their typical use.  Apologies if I am missing something obvious.  Thank
you.

[1] http://www.w3.org/TR/soap12-part1/#structinterpbodies
[2] http://www.ietf.org/rfc/rfc3023.txt

------------------------------------------------------------------------
Noah Mendelsohn                                    Voice: 1-617-693-4036
IBM Corp.                                          Fax: 1-617-693-8676
One Rogers Street
Cambridge, MA 02142
------------------------------------------------------------------------
Received on Tuesday, 15 January 2002 14:56:43 UTC