- From: Erik Bruchez <ebruchez@orbeon.com>
- Date: Mon, 16 Jan 2006 15:44:47 +0100
- To: public-xml-processing-model-wg@w3.org
Robin Berjon wrote: > The downside of allowing arbitrary DM sequences of atoms is that you > now have a pipeline language that can do absolutely anything :) XDM remains quite in the scope of "XML": it is a well-defined XML data model fit for XPath and XQuery processing, and it could, why not, be fit for our conception of an XML processing model/language. Historically, XML processing languages have typically exchanged complete XML documents. This includes XPL, so I should maybe be defending the concept, which has the huge benefit of simplicity: one output produces one XML document; one input receives one XML document. However since this discussion is contemplating the question of passing: 1. Sequences of documents or nodes 2. Pure text It seems to me that it makes sense to look at what the latest XML work has produced that addresses those points, and that is XDM. Rather than reinventing the wheel, I think it is worth looking at existing specifications. > A pipeline for such a model would only be XML in that it might have > some specific abilities for dealing with XML data, but it would also > be able to perform entire complex processes that would never see > anything that can be serialised as an XML document. There is nothing wrong with making a distinction between XML processing and XML serialization. XSLT and XQuery do this quite happily. XSLT is an XML processing language, and not until you get to outputting a resulting *document* do you consider the questions related to serialization. Further: 1. If you accept the idea of "sequences" being exchanged between components, then in that case as well you will not be able to serialize the result as a single XML document anyway. 2. Same thing if you accept the idea of passing "text" between components: text certainly does not serialize as XML in general (cases of XQuery and Relax NG compact in point). So it seems to me that the necessary conclusion if you accept those two points is that you would simply give up the idea that something traveling between two components is necessary serializable as a single, complete XML document. Using items instead of nodes does not change that. > Also what happens if you input a sequence of ints to something > that's an "old school" XML processor? How do you degrade such a > sequence so that it can usefully be input into a component that only > has a notion of nodes? A few comments: 1. You wouldn't degrade: if somebody outputs a "xs:int+" (sequence of xs:int containing at least one item), and you connect that to somebody who expects a "document-node()", you would get an error. As simple as that :-) You could even do static type-checking in many cases. Other example: if somebody passes "XQuery" as text (xs:string) to an XSLT processor expecting an XML input document (document-node()), you would get an error. You cannot, and probably would not want to, see any degradation process take place. If all the components only exchange XML documents, then the situation is simpler. But if you go one step further and talk about exchanging sequences of nodes (or items), then the question of compability between one step and the other becomes a little more complex. But it is also clear that even with only complete XML documents being exchanged between steps, there is a question of compatibility (read: typing): for example a step my expect an XML document satisfying a particular XML schema. So in general, you do have a question of type checking that can become important. XPL, as mentioned earlier, supports inline validation of XML infosets with XML schema and Relax NG, and that effectively brings some level of typing to the XML pipeline language. By going to the XDM, we only generalize that solution. 2. Having sequences of nodes vs. sequences of items passed between processors does not change the question of the "degradation" that you are raising, unless I am missing something. > As I've said in another email, I wouldn't want to reject this option > but it needs to be simple and interoperable with nodes-only > components. I would suggest that someone (who isn't me :) take an > action item to detail the impact it would have in various scenarios, > and perhaps offer a strawman. I do see several benefits in going the item() way: 1. The term "sequence" is only formally specified in XML AFAIK (I could be wrong though), by XDM, and is defined in terms of items. By using the existing work done by XDM, you: a. Reuse existing specifications, and therefore avoid confusion with a hypotetically new, XProc-specific definition of a sequence. Again, I think it is important not to reinvent the wheel. b. Address the questions of passing: sequences of documents, sequences of elements, text-only and sequences of text (xs:string), and even heterogeneous sequences. 2. Possibility of doing type-checking, including static type-checking. 3. Solves the conceptual question of orphan text nodes (you just pass xs:string or other simple XML types). 4. Opens up the door for optional, full XML schema validation. 5. I don't think the solution would be really more complex than an XProc-specific solution, and it would have the benefit of being already fully specified, which means zero specification work on our side for this part, which also means we get to have a shorter spec :-) All the above, of course, is relevant only if we think that going the way of passing sequences of things and/or pure text between components, is the way to go. If we consider passing XML infosets only, then we probably don't need to bother. -Erik
Received on Monday, 16 January 2006 14:45:08 UTC