Re: p:pipeline from Rui Lopes on 2006-07-20 (public-xml-processing-model-wg@w3.org from July 2006)

From: Rui Lopes <rlopes@di.fc.ul.pt>
Date: Thu, 20 Jul 2006 23:27:44 +0100
To: public-xml-processing-model-wg@w3.org
Message-ID: <44C00360.20705@di.fc.ul.pt>
Norman Walsh wrote:
> True. But requiring all pipelines to be in external files strikes me
> as analagous to requiring all xsl:templates to be in separate files.

I've been thinking lately about the issue of including and reusing 
pipelines. Is this e-mail I'll be using the expressions "main pipeline" 
and "called pipeline" to avoid potentially ambiguous interpretations. I 
believe that we should be aware of three sides for reusing pipelines:

1) inline specification in the "main pipeline" document: useful when we 
don't want to repeat a sequence of steps/processing logic inside our 
"main pipeline" document. This is something like defining a function on 
some programming language;


2) include a "called pipeline" into a "main pipeline": useful when 
creating more complex processing applications, whether focusing on 
modularization for defining "called pipeline" libraries, or just to ease 
management and maintenance of bigger pipeline-based projects. This is 
similar to using a C preprocessor or an XSLT include directive;


3) use an external "called pipeline": useful when some pipeline logic is 
executed elsewhere, whether for resource-intensive computations or using 
a pipeline service provided by someone else. This is akin to calling a 
web-service.


Having this said, we may classify these three sides in two ways:

a) whitebox "called pipeline" usage: the "called pipeline" is executed 
within the context of the "main pipeline", i.e. all pipeline logic 
(steps, choices, etc.) is included and available at compile time. This 
requires having the "called pipeline" components available in the 
pipeline implementation (e.g. potential non-standard components). 
Situations 1) and 2) are whitebox approaches;


b) blackbox "called pipeline" usage: the "called pipeline" is executed 
outside the context of the "main pipeline", tipically in a remote 
machine. The "called pipeline" document may contain any type of 
non-standard components and non-standard pipeline logic, which the 
"called pipeline" execution context must be aware of (as opposed to 
classification a)). This feature allows for building web-services-alike 
pipelines. Situation 3) is a blackbox approach.



I have no definite answer about the three reuse scenarios, but we should 
definitely, at some point, discuss this issue. Nevertheless, they 
leverage some thoughts on the pipeline syntax:

i) "called pipeline" definition: this issue should be considered for 
situations 1) and 2). Situation 3), as it is defined as blackboxed, 
doesn't raise any issue on "called pipeline" definition. Having this 
said, "called pipelines" may be defined as:

   *) document fragments (situation 1) - <p:sub-pipeline name="..." />

   **) full documents (situation 2) - <p:sub-pipeline href="..." /> or 
<p:import href="..." />


ii) "called pipeline" visibility: if I have a "called pipeline" CP1 
defined inline on a "main pipeline" MP1, should I be allowed to call CP1 
directly from another "main pipeline" MP2? Or is CP1's just visible on 
MP1, and MP2 is just allowed to use MP2 as a "called pipeline"? These 
issues occur if both situations 1) and 2) are allowed;


iii) "called pipeline" invocation: I've faced two options on invocation:

   *) <p:step kind="called-pipeline-name" /> - this option may leave out 
externally defined pipelines, i.e. situations 2) and 3);

   **) <p:step kind="p:pipeline" /> - this option requires a more 
verbose approach, as the pipeline name or URI have to be specified 
(albeit more generic). This happens if situations 2) and 3) are 
supported. It's worth mentioning that if a URI reference to the "called 
pipeline" is allowed, there mayl be no need for defining a 
<p:sub-pipeline href="..." /> construct to support situation 2).


iv) recursion: simply no recursion allowed. A compile-time error is 
raised. This issue arises in situation 2).


As I said before, I have no concrete answers to these thoughts/issues, 
but I would definitely like to allow the support for, at least, 
situations 1) and 2). It may be worth mentioning (albeit not discussing) 
that I intentionally left out a fourth scenario, regarding dynamic 
pipeline creation (I don't want to touch that in the near future).


Cheers,

Rui
Received on Thursday, 20 July 2006 22:27:55 UTC