Re: Component interfaces from Erik Bruchez on 2006-01-16 (public-xml-processing-model-wg@w3.org from January 2006)

From: Erik Bruchez <ebruchez@orbeon.com>
Date: Mon, 16 Jan 2006 15:44:47 +0100
To: public-xml-processing-model-wg@w3.org
Message-ID: <43CBB15F.90109@orbeon.com>
Robin Berjon wrote:

 > The downside of allowing arbitrary DM sequences of atoms is that you
 > now have a pipeline language that can do absolutely anything :)

XDM remains quite in the scope of "XML": it is a well-defined XML data
model fit for XPath and XQuery processing, and it could, why not, be
fit for our conception of an XML processing model/language.

Historically, XML processing languages have typically exchanged
complete XML documents. This includes XPL, so I should maybe be
defending the concept, which has the huge benefit of simplicity: one
output produces one XML document; one input receives one XML document.

However since this discussion is contemplating the question of
passing:

1. Sequences of documents or nodes
2. Pure text

It seems to me that it makes sense to look at what the latest XML work
has produced that addresses those points, and that is XDM. Rather than
reinventing the wheel, I think it is worth looking at existing
specifications.

 > A pipeline for such a model would only be XML in that it might have
 > some specific abilities for dealing with XML data, but it would also
 > be able to perform entire complex processes that would never see
 > anything that can be serialised as an XML document.

There is nothing wrong with making a distinction between XML
processing and XML serialization. XSLT and XQuery do this quite
happily. XSLT is an XML processing language, and not until you get to
outputting a resulting *document* do you consider the questions
related to serialization.

Further:

1. If you accept the idea of "sequences" being exchanged between
    components, then in that case as well you will not be able to
    serialize the result as a single XML document anyway.

2. Same thing if you accept the idea of passing "text" between
    components: text certainly does not serialize as XML in general
    (cases of XQuery and Relax NG compact in point).

So it seems to me that the necessary conclusion if you accept those
two points is that you would simply give up the idea that something
traveling between two components is necessary serializable as a
single, complete XML document. Using items instead of nodes does not
change that.

 > Also what happens if you input a sequence of ints to something
 > that's an "old school" XML processor? How do you degrade such a
 > sequence so that it can usefully be input into a component that only
 > has a notion of nodes?

A few comments:

1. You wouldn't degrade: if somebody outputs a "xs:int+" (sequence of
    xs:int containing at least one item), and you connect that to
    somebody who expects a "document-node()", you would get an
    error. As simple as that :-) You could even do static type-checking
    in many cases.

    Other example: if somebody passes "XQuery" as text (xs:string) to
    an XSLT processor expecting an XML input document
    (document-node()), you would get an error. You cannot, and probably
    would not want to, see any degradation process take place.

    If all the components only exchange XML documents, then the
    situation is simpler. But if you go one step further and talk about
    exchanging sequences of nodes (or items), then the question of
    compability between one step and the other becomes a little more
    complex.

    But it is also clear that even with only complete XML documents
    being exchanged between steps, there is a question of compatibility
    (read: typing): for example a step my expect an XML document
    satisfying a particular XML schema.

    So in general, you do have a question of type checking that can
    become important. XPL, as mentioned earlier, supports inline
    validation of XML infosets with XML schema and Relax NG, and that
    effectively brings some level of typing to the XML pipeline
    language. By going to the XDM, we only generalize that solution.

2. Having sequences of nodes vs. sequences of items passed between
    processors does not change the question of the "degradation" that
    you are raising, unless I am missing something.

 > As I've said in another email, I wouldn't want to reject this option
 > but it needs to be simple and interoperable with nodes-only
 > components. I would suggest that someone (who isn't me :) take an
 > action item to detail the impact it would have in various scenarios,
 > and perhaps offer a strawman.

I do see several benefits in going the item() way:

1. The term "sequence" is only formally specified in XML AFAIK (I
    could be wrong though), by XDM, and is defined in terms of
    items. By using the existing work done by XDM, you:

    a. Reuse existing specifications, and therefore avoid confusion
       with a hypotetically new, XProc-specific definition of a
       sequence. Again, I think it is important not to reinvent the
       wheel.

    b. Address the questions of passing: sequences of documents,
       sequences of elements, text-only and sequences of text
       (xs:string), and even heterogeneous sequences.

2. Possibility of doing type-checking, including static type-checking.

3. Solves the conceptual question of orphan text nodes (you just pass
    xs:string or other simple XML types).

4. Opens up the door for optional, full XML schema validation.

5. I don't think the solution would be really more complex than an
    XProc-specific solution, and it would have the benefit of being
    already fully specified, which means zero specification work on our
    side for this part, which also means we get to have a shorter
    spec :-)

All the above, of course, is relevant only if we think that going the
way of passing sequences of things and/or pure text between
components, is the way to go. If we consider passing XML infosets
only, then we probably don't need to bother.

-Erik
Received on Monday, 16 January 2006 14:45:08 UTC