Re: What is passed between processes? from Jeni Tennison on 2006-01-12 (public-xml-processing-model-wg@w3.org from January 2006)

From: Jeni Tennison <jeni@jenitennison.com>
Date: Thu, 12 Jan 2006 12:29:34 +0000
To: public-xml-processing-model-wg@w3.org
Message-ID: <1002358868.20060112122934@jenitennison.com>
Norm wrote:
> / Jeni Tennison <jeni@jenitennison.com> was heard to say:
> | I think that one of the requirements for the XML Processing
> | Model/Language is that it needs to support passing XML documents in
> | the forms of serialised XML, infosets *and* augmented infosets, as
> | appropriate, between processes in a pipeline. Other thoughts?
>
> To the extent possible, I'd like the exact representation passed
> between processes to be an implementation detail. On the one hand, I
> think we'll get a lot of pushback if an implmentation that passes
> SAX events between components can't be conformant to our spec. On
> the other, implementations built around XPath2/XSLT2/XQuery are
> obviously going to want to pass XDM instances around and I want
> those to be conformant too.

I think it's vitally important that we make the clear distinction
between the *information* that's passed between components and the
*interface* that's used to pass that information between components.

I am in absolute agreement with Norm and the others who've said that
we shouldn't say anything about the interface that's used to pass
information between components: we certainly don't want to mandate
SAX, DOM, XOM or any other interface.

However, I think that for interoperability between pipeline engines,
we're going to have to address what information gets passed between
components. We are already making some assumptions on this issue:
we're talking about passing around "sequences of nodes", which implies
XDM instances and all the extra typing information they contain.

On the interoperability front, say I define the pipeline:

  1. Schema-validate a document
  2. Transform it with XSLT 2.0

and I try it with two different pipeline engines: one that passes
around PSVIs (via an implementation-specific interface) and one that
passes around infosets (via SAX, say). In the first, the typing
information added during validation is retained and can be used within
the transformation; in the second, it's lost. As a result, the
transformation may well generate two different documents depending on
the pipeline engine that gets used.

In addition, if we support conditional processing then we're going to
have to make clear what kinds of things can be tested for. If we want
to support the pipeline:

  1. Schema-validate a document
  2. *If it's valid*, transform it with XSLT 2.0

then the pipeline language needs to be able to articulate the test "if
its valid" (querying the values of the [validation attempted] and
[validity] properties in the PSVI) and pipeline engines need to be
passing around PSVIs.

Now, perhaps there are other ways around this issue: for example,
tightly coupling validation and processing (such that you don't see
schema validation as a separate step in a pipeline, but as something
that happens to the input or output of a process) would avoid the
issue of passing around PSVIs.

But I'm absolutely convinced that we have a requirement to somehow
cater for processes that work on serialised XML and augmented infosets
as well as vanilla infosets, and that we need to say something about
what gets passed between components (though not *how* it gets passed)
to explain what level of interopeability people can expect between
pipeline engines.

Cheers,

Jeni

---
Jeni Tennison
http://www.jenitennison.com/
Received on Thursday, 12 January 2006 12:29:54 UTC