What do we standardize?

I was going to reply to Jeni's message, but realised that I need to
address a rather wider issue: what are we going to standarize?

It seems natural to me to divide the problem into several parts:

- the pipeline language itself.  I call an implementation of this
  a "pipeline engine";

- a set of standard components, such as XSLT and XInclude;

- a framework for writing additional components that are interoperable
  with other suppliers' pipeline engines and components;

- a component description language that would specify such things as
  the number of inputs and outputs a component has, what parameters it
  takes, and what infoset extensions it needs.

I separated the third and fourth points because I think the component
description will be useful even if you only have standard components.
For example, a pipeline consistency checker or graphical interface for
building pipelines could use them.

I think we will find it easiest to first standardise only the first
two, and in any case there should be a level of conformance that
allows systems that only provide the first two.  In this case, what
flows between components need not be specified, since it is internal
to the implementation.

Once we want to be able to write components that are interoperable
between pipeline engines, we need to specify both what flows between
the components and how the components interface with the engine.

Here are two very different possible pipeline systems, both of which I
would like to be able to claim some level of conformance:

(1) A system in which components are standalone programs that read and
    write plain XML files, and which compiles pipelines into
    unix-style shell scripts.  Obviously this system can only handle
    vanilla infosets, though we could (later) define standard
    serializations of extended infosets such as the PSVI.

(2) A system in which components are Java classes conforming to an
    interface.  The interface would have methods to start the component,
    somehow read and write infosets (either complete or streaming),
    and signal and receive exceptions.  The infosets would allow
    arbitrary extensions some of which would be standardized.

-- Richard

Received on Tuesday, 10 January 2006 17:14:32 UTC