Re: Pipeline proposal from Richard Tobin on 2006-03-28 (public-xml-processing-model-wg@w3.org from March 2006)

From: Richard Tobin <richard@inf.ed.ac.uk>
Date: Tue, 28 Mar 2006 17:54:46 +0100 (BST)
To: Jeni Tennison <jeni@jenitennison.com>, public-xml-processing-model-wg@w3.org
Message-Id: <20060328165446.1138C5C650F@macintosh.inf.ed.ac.uk>
> Why do you say "Each port must be connected to exactly one other port."? 
> I see that you've suggested 'sink' and 'duplicator' components to 
> support occasions when you want to ignore some output or connect some 
> output to more than one input, but you wouldn't need these components if 
> you could simply connect each port to zero or more than one other port. 
> What's the rationale behind this constraint?

I just considered it to be the simplest version, and there is perhaps
an advantage to requiring the programmer to be explicit about unconnected
and multiply connected ports.  But it is in no way essential.

> There seems to be a growing consensus within the group, reflected here, 
> that inputs and outputs accept/produce *documents* and can't 
> accept/produce document fragments (e.g. elements).

Yes, we seemed to have consensus on that among the people who were
present at the Tech Plenary.  I think we should at least start from
that position and defer anything more as a future possibility.

> You talk about declaring the cardinality of the inputs/outputs a 
> component accepts/produces. Another thought, perhaps for a future 
> version, would be to provide more detail about what's expected/generated 
> by a particular component: perhaps giving the name of the document 
> element or even an entire schema that the documents adhere to. I'd like 
> to see us allow for that kind of extension if we don't support it in 
> this version.

That has some attractions.  It raises the question of what such
declarations mean: would they be constraints to be checked, or
assertions that could be assumed true for optimisation purposes?  If
something like my proposal was the bottom layer of a more
sophisticated system, such the compiler might insert validation
components to check the assertions.

> I'm concerned about how we define parameters. In particular, I'm worried 
> about the XSLT case where one of the parameters for the component is a 
> set of QName/value pairs (the XSLT parameters). I guess that in this 
> proposal, you'd do that with a parameter called 'parameters' whose value 
> was a formatted string such as '{uri1}local1=value1; 
> {uri2}local2=value2'

That was what I was thinking of for this minimal version.

> You say: "Except as described in conditionals, all components in a 
> pipeline are run (in particular, they do not get run only if input 
> arrives or output is requested)." I'm not sure of your intention here. 
> I'm worried that this constraint prevents implementations from caching 
> and reusing intermediate documents (if they can detect that the 
> information that led to the generation of those documents hasn't 
> changed). Perhaps we need to look at the question of whether components 
> can have side-effects to work out whether this is important or not.

The intention was indeed to prevent implementations from doing such
things, at least by default.  I think the compiler should run
everything unless it can prove that it will make no difference.  It
would be possible to add a declaration to component definitions
asserting that a component has no side effects (and relies on no
hidden input), so that the compiler could do this.  And it would
probably be useful to add it to component instances too: the generic
XSLT 1.0 component can't make such an assertion because it may fetch
arbitrary documents with the document() function, but a particular use
of it might be able to guarantee that this will not happen.

-- Richard
Received on Tuesday, 28 March 2006 16:54:50 UTC