- From: Mark Baker <distobj@acm.org>
- Date: Wed, 11 May 2005 10:43:14 -0400
- To: noah_mendelsohn@us.ibm.com
- Cc: xml-dist-app@w3.org
Hi Noah. Thanks for asking the hard questions. 8-) Sorry for the delay, I was on vacation. I think the disagreement here might be that you seem to be making an assumption which I'm not, and which isn't, AFAICT, licensed by any specification in the stack. That's not to say that such an assumption can't be used to build successful XML based systems (indeed, it's already in use), only that I think it makes doing so more difficult. The assumption I'm speaking of sometimes goes by the name "namespace dispatching", and it involves using the namespace of the root element of an XML document to determine the specification by which the contents of the document are interpreted. Consider this document; <Person xmlns="http://example.org/foofoo/"> <name>Mark Smith</name> <age>55</age> </Person> Using namespace dispatching, one would interpret the semantics of that document per those prescribed in the specification associated with the namespace identified by the URI "http://example.org/foofoo/". The other approach is traditional media type dispatch. Using it, the message need only indicate the (say) "application/person+xml" media type in order to know which specification to use to interpret the document. I claimed above that there's no specification which prescribes namespace dispatch. That claim was based on a paragraph of RFC 3023 that I helped edit, FWIW 8-); An XML document labeled as text/xml or application/xml might contain namespace declarations, stylesheet-linking processing instructions (PIs), schema information, or other declarations that might be used to suggest how the document is to be processed. For example, a document might have the XHTML namespace and a reference to a CSS stylesheet. Such a document might be handled by applications that would use this information to dispatch the document for appropriate processing. Emphasis on "might". There are several problems I see with namespace dispatching which are a direct result of it being an intrinsic rather than an extrinsic mechanism (aka unlayered) like media types. Amoungst these problems is that it prevents that sample document above from being interpreted using other semantics, such as RDF (in fact, it is a valid RDF/XML document). Another example is the shortform XSLT stylesheet notation using literal result elements; http://www.w3.org/TR/xslt#result-element-stylesheet It also creates visibility and security problems, since, for example, an HTML document may include Javascript which can rewrite its root element after passing a firewall, where the new root element can prescribe processing semantics which the firewall might be configured to disallow (by banning certain media types). There's also evolvability problems, as now the namespace is being asked to play the role of a coarser grained data element, the media type. Plus, there's performance problems, as the media type is readily available in plain text form, early in the message, while the namespace, being in the body of the message, might be compressed, encrypted, or otherwise transformed, delaying the time at which the processing application can be activated. And FWIW, if I haven't pointed you to them already, you might be interested in these two links, the first of which is a presentation I gave to the Compound Document Formats WG earlier this year on the subject as it relates to comp docs. The second is a partial examination of XML based dispatching behaviour in common Web browsers. http://www.markbaker.ca/Talks/2004-media-types-and-compdocs/slide1-0.html http://www.markbaker.ca/2004/01/XmlNamespaceDispatchTest/ Cheers, Mark. On Wed, Apr 27, 2005 at 04:49:39PM -0400, noah_mendelsohn@us.ibm.com wrote: > Mark, > > This is very helpful, thank you. I'm tempted to ask why SOAP is any more > broken or at risk than XML itself? > > In the case of XML, we have a media type "application/xml". That tells > you to expect perhaps an XML declaration, perhaps an internal subset, and > then a root element the nature of which is completely unknown. How are > these disambiguated in practice? How do I recognize a purchase order from > an invoice? Answer: if the root is namespace qualified you infer from > the QName. You might reasonably say that I could also register > application/purchaseOrder+xml, and that would indeed add additional out of > stream information, but surely the creation of such a media type is > optional. There is surely no rule that a new media type is to be > registered for each XML root element QName. > > Now consider SOAP documents, such as those exchanged by the SOAP HTTP > binding. The binding serializes the Infoset to a stream of type > application/soap+xml, which is a specialization of application xml. The > specialization tells you some additional things, such as that the root > type will be soap:envelope. Crucially, it also tells you that there will > be a body containing an XML element. I fail to see how you know any more > or less about the body element than you generally do about the root of an > application/xml stream. In both cases, you know that it's an XML element > and that you need to recognize the QName to infer its type. > > SOAP looks to me no more or less broken than XML itself. In both cases, > you need to look at the QName to see what's going on. For > application/xml, it's the root QName, for application/soap+xml it's the > body child element. > > By the way, if media types were enhanced to get rid of the 1-level + sign > kludge, then you might do something like registering: > application/purchaseorder+soap+xml, the same option you have at the root > level today. > > Also, I think one can make the case that for each application/*+xml media > type there should be a dual for the typing of Infosets. The media type > covers the serialized form, and the dual types the corresponding infoset. > Whether the duals are given separate names, or whether the use of the > media type name is licensed for both I'm not sure I care. I do think that > the two types are different, as one is streams and one in general isn't. -- Mark Baker. Ottawa, Ontario, CANADA. http://www.markbaker.ca Coactus; Web-inspired integration strategies http://www.coactus.com
Received on Wednesday, 11 May 2005 14:43:05 UTC