- From: <noah_mendelsohn@us.ibm.com>
- Date: Thu, 19 May 2005 09:00:38 -0400
- To: Mark Baker <distobj@acm.org>
- Cc: xml-dist-app@w3.org
Mark Baker writes: > I think the disagreement here might be that you > seem to be making an assumption which I'm not, and > which isn't, AFAICT, licensed by any specification > in the stack. That's not to say that such an > assumption can't be used to build successful XML > based systems (indeed, it's already in use), only > that I think it makes doing so more difficult. > > The assumption I'm speaking of sometimes goes by > the name "namespace dispatching", and it involves > using the namespace of the root element of an XML > document to determine the specification by which > the contents of the document are interpreted. Not quite. My assumption is that, particularly in the case of meta-formats such as XML, the media type will determine the semantics of the entire family of documents conforming to that media type, and the meaning of any particular document will be determined by its content. In the case of XML, saying that it's application/xml says that interpretation per XML 1.0 is licensed. Beyond that, there may or may not be specifications licensing specific interpretations for particular classes of instance documents. While I did refer to the common case where such specifications key on a root element, nothing in the argument depends on that. Note that I carefully said in the above legal interpretations of the document, not instructions for processing. Big difference. The specs say: you may consider this to be a purchase order. That's separate from the specification for a piece of code that might say: "when given a document that is legally interpreted as a purchase order, I will actually (a) purchase something (b) check inventory (c) store the order on disk (d) pretty print it." I've never said that processing rules are inherent in the document. > The other approach is traditional media type > dispatch. Using it, the message need only > indicate the (say) "application/person+xml" media > type in order to know which specification to use > to interpret the document. True, and I'd encourage this insofar as it applies, but media types currently don't scale. You can say this is application/soap+xml, but not as far as I know application/purchaseOrder+soap+xml. They certainly don't scale to mixin semantics right now: application/nonRepudiable&cacheable&purchaseOrder+soap+xml (where in this example, the nonRepudiable and cacheable are not in a subtype relation. They can be freely mixed and matched.) > I claimed above that there's no specification > which prescribes namespace dispatch. There is in at least the case of SOAP headers. Quoting from the Recommendation [1]: "A SOAP header block is said to be understood by a SOAP node if the software at that SOAP node has been written to fully conform to and implement the semantics specified for the XML expanded name of the outer-most element information item of that header block." The SOAP body is different, and is parallel to the case of unwrapped application/xml I think [2]: "An ultimate SOAP receiver MUST correctly process the immediate children of the SOAP body (see 5.3 SOAP Body). However, with the exception of SOAP faults (see 5.4 SOAP Fault), Part 1 of this specification (this document) mandates no particular structure or interpretation of these elements, and provides no standard means for specifying the processing to be done." So, in the case of faults, the root QName determines the interpretation because the spec says it does. For other bodies, there is no such assumption. As with complete XML documents, you know it's XML, but beyond that the means used to decide on the significance are beyond the scope of the XML or SOAP specs respectively. Other specs may provide such interpretations (e.g. the XHTML spec), and many of them do key on the root QName. > There are several problems I see with namespace > dispatching which are a direct result of it being > an intrinsic rather than an extrinsic mechanism > (aka unlayered) like media types. Amoungst these > problems is that it prevents that sample document > above from being interpreted using other > semantics, such as RDF (in fact, it is a valid > RDF/XML document). I don't think I ever said that there should be only one legal way to process any given document. I do think it is reasonable to write specifications for particular processors (SOAP processors, browsers, etc.) that say: "this processor will for its purposes key on the QName of the (root) element to determine the processing to be done." Absolutely it should be possible for a different application to determine its mode of processing on all manner of other available information including that contained within the document (PIs?), outside the document (the media type, the encoding, the length of the file, the date received), etc. Just as the root QName is not the right answer in all cases, neither is the media type. I do think that the interpretation of a document should seldom if even be in conflict with that suggested by the media type or by the specifications for its content (e.g. the XHTML spec). Thus, to process an image/jpeg as an XML file is surely an error. To decide that a given application/xml file is in particular a purchase order based on the root element seems to me not an error, especially if someone has written a specification saying that this is the proper interpretation of that QName. By all means where possible one should invent and use a more specialized media type such as application/po+xml, but there are rather severe limits to what can be captured in media types. Furthermore, the facts that QNames contain URIs and media types do not, and that QNames can be created in a distributed manner, almost ensures that there will be cases where QNames offer needed power that media types do not. > Plus, there's performance problems, as the media > type is readily available in plain text form, > early in the message, while the namespace, being > in the body of the message, might be compressed, > encrypted, or otherwise transformed, delaying the > time at which the processing application can be > activated. I think this is a red herring. If you want to pull a fine grained document type out into something like an HTTP header you can. You then have the same problems you do with the coarse grain media type: I.e. you are establishing a consistency dependency between the typing information in the headers and that implied by the format of the document. As with all such things, you suffer from the tendencies that, for example, the document will be signed separately from the HTTP headers. I think this is mostly an optimization. Consistency can be ensured by having trusted code ensure that when the document is ultimately parsed, it is checked against the typing in the headers. That issue is the same for coarse and fine grained types. You can lie about the media type and cause false dispatching or routing; your check on that is to ensure that trusted code ultimately validates the data against the media type. You can lie about a fine grained type in an HTTP header; your check is to ensure that trusted code ultimately makes sure the QName of (probably the root) matches that header. I'm traveling without net access at the moment, but I'll check out your two references when I get a chance. Thanks! Noah [1] http://www.w3.org/TR/soap12-part1/#muprocessing [2] http://www.w3.org/TR/soap12-part1/#structinterpbodies -------------------------------------- Noah Mendelsohn IBM Corporation One Rogers Street Cambridge, MA 02142 1-617-693-4036 --------------------------------------
Received on Thursday, 19 May 2005 13:01:06 UTC