- From: Erik Bruchez <ebruchez@orbeon.com>
- Date: Mon, 16 Jan 2006 15:44:47 +0100
- To: public-xml-processing-model-wg@w3.org
Robin Berjon wrote:
> The downside of allowing arbitrary DM sequences of atoms is that you
> now have a pipeline language that can do absolutely anything :)
XDM remains quite in the scope of "XML": it is a well-defined XML data
model fit for XPath and XQuery processing, and it could, why not, be
fit for our conception of an XML processing model/language.
Historically, XML processing languages have typically exchanged
complete XML documents. This includes XPL, so I should maybe be
defending the concept, which has the huge benefit of simplicity: one
output produces one XML document; one input receives one XML document.
However since this discussion is contemplating the question of
passing:
1. Sequences of documents or nodes
2. Pure text
It seems to me that it makes sense to look at what the latest XML work
has produced that addresses those points, and that is XDM. Rather than
reinventing the wheel, I think it is worth looking at existing
specifications.
> A pipeline for such a model would only be XML in that it might have
> some specific abilities for dealing with XML data, but it would also
> be able to perform entire complex processes that would never see
> anything that can be serialised as an XML document.
There is nothing wrong with making a distinction between XML
processing and XML serialization. XSLT and XQuery do this quite
happily. XSLT is an XML processing language, and not until you get to
outputting a resulting *document* do you consider the questions
related to serialization.
Further:
1. If you accept the idea of "sequences" being exchanged between
components, then in that case as well you will not be able to
serialize the result as a single XML document anyway.
2. Same thing if you accept the idea of passing "text" between
components: text certainly does not serialize as XML in general
(cases of XQuery and Relax NG compact in point).
So it seems to me that the necessary conclusion if you accept those
two points is that you would simply give up the idea that something
traveling between two components is necessary serializable as a
single, complete XML document. Using items instead of nodes does not
change that.
> Also what happens if you input a sequence of ints to something
> that's an "old school" XML processor? How do you degrade such a
> sequence so that it can usefully be input into a component that only
> has a notion of nodes?
A few comments:
1. You wouldn't degrade: if somebody outputs a "xs:int+" (sequence of
xs:int containing at least one item), and you connect that to
somebody who expects a "document-node()", you would get an
error. As simple as that :-) You could even do static type-checking
in many cases.
Other example: if somebody passes "XQuery" as text (xs:string) to
an XSLT processor expecting an XML input document
(document-node()), you would get an error. You cannot, and probably
would not want to, see any degradation process take place.
If all the components only exchange XML documents, then the
situation is simpler. But if you go one step further and talk about
exchanging sequences of nodes (or items), then the question of
compability between one step and the other becomes a little more
complex.
But it is also clear that even with only complete XML documents
being exchanged between steps, there is a question of compatibility
(read: typing): for example a step my expect an XML document
satisfying a particular XML schema.
So in general, you do have a question of type checking that can
become important. XPL, as mentioned earlier, supports inline
validation of XML infosets with XML schema and Relax NG, and that
effectively brings some level of typing to the XML pipeline
language. By going to the XDM, we only generalize that solution.
2. Having sequences of nodes vs. sequences of items passed between
processors does not change the question of the "degradation" that
you are raising, unless I am missing something.
> As I've said in another email, I wouldn't want to reject this option
> but it needs to be simple and interoperable with nodes-only
> components. I would suggest that someone (who isn't me :) take an
> action item to detail the impact it would have in various scenarios,
> and perhaps offer a strawman.
I do see several benefits in going the item() way:
1. The term "sequence" is only formally specified in XML AFAIK (I
could be wrong though), by XDM, and is defined in terms of
items. By using the existing work done by XDM, you:
a. Reuse existing specifications, and therefore avoid confusion
with a hypotetically new, XProc-specific definition of a
sequence. Again, I think it is important not to reinvent the
wheel.
b. Address the questions of passing: sequences of documents,
sequences of elements, text-only and sequences of text
(xs:string), and even heterogeneous sequences.
2. Possibility of doing type-checking, including static type-checking.
3. Solves the conceptual question of orphan text nodes (you just pass
xs:string or other simple XML types).
4. Opens up the door for optional, full XML schema validation.
5. I don't think the solution would be really more complex than an
XProc-specific solution, and it would have the benefit of being
already fully specified, which means zero specification work on our
side for this part, which also means we get to have a shorter
spec :-)
All the above, of course, is relevant only if we think that going the
way of passing sequences of things and/or pure text between
components, is the way to go. If we consider passing XML infosets
only, then we probably don't need to bother.
-Erik
Received on Monday, 16 January 2006 14:45:08 UTC