Re: Split and eval, the case for arbitrary numbers of outputs from Innovimax W3C on 2012-04-30 (public-xml-processing-model-wg@w3.org from April 2012)

From: Innovimax W3C <innovimax+w3c@gmail.com>
Date: Mon, 30 Apr 2012 18:15:18 +0200
To: Norman Walsh <ndw@nwalsh.com>
Cc: public-xml-processing-model-wg@w3.org
Message-ID: <CAAK2GfEZbM+adv6mPSidb+hUm6JVJ4HtszgDT=tV-vpiXBNRXA@mail.gmail.com>

We can also add NVDL as a use case for "*" outputs and "*" inputs

Mohamed

On Thu, Apr 26, 2012 at 3:31 PM, Norman Walsh <ndw@nwalsh.com> wrote:

> Per my action from last week...
>
> Part of my plan for (re)implementing my XProc processor involves performing
> more aggressive graph analysis. This has two benefits: first, I'll be able
> to establish thread boundaries and do multi-threaded processing and second,
> I'll be able to identify (sub)pipelines that can be streamed.
>
> In order to make the graph more amenable to this sort of streaming and
> rewriting, I'm transforming the user's pipeline into something with
> explicit steps for actions like splitting.
>
> Consider this pipeline fragment:
>
>  <p:identity name="root"/>
>
>  <p:identity name="branch1">
>    <p:input port="source">
>      <p:pipe step="root" port="result"/>
>    </p:input>
>  </p:identity>
>
>  <p:identity name="branch2">
>    <p:input port="source">
>      <p:pipe step="root" port="result"/>
>    </p:input>
>  </p:identity>
>
> The two identity steps branch1 and branch2 both read from the same
> "result" port on the "root" step. At an implementation level that requires
> some sort of buffering or copying. I want to make that explicit, so
> I'm introducing an explicit split step:
>
>  <p:identity name="root"/>
>
>  <internal:split name="ID00001">
>
>  <p:identity name="branch1">
>    <p:input port="source">
>      <p:pipe step="ID00001" port="result1"/>
>    </p:input>
>  </p:identity>
>
>  <p:identity name="branch2">
>    <p:input port="source">
>      <p:pipe step="ID00001" port="result2"/>
>    </p:input>
>  </p:identity>
>
> So what's the declaration for the internal:split step? It's something
> like this:
>
>  <p:declare-step type="internal:split">
>    <p:input port="source" sequence="true" primary="true"/>
>    <p:output port="result1" sequence="true" primary="false"/>
>    <p:output port="result2" sequence="true" primary="false"/>
>  </p:declare-step>
>
> And I could declare internal:split2, internal:split3, etc. steps. But
> really this is just a magic step with an arbitrary number of output
> ports.
>
> The same problem exists if you want to write an eval step:
>
> <p:declare-step type="cx:eval">
>   <p:input port="pipeline"/>
>   <p:input port="source" sequence="true"/>
>   <p:input port="options"/>
>   <p:output port="result"/>
>   <p:option name="step" cx:type="xsd:QName"/>
>   <p:option name="detailed" cx:type="xsd:boolean"/>
> </p:declare-step>
>
> This is a step that takes *an XML pipeline document* as it's input,
> compiles it, and runs it. The problem, of course, is that the number
> of inputs and outputs that this step needs is determined entirely by
> the input pipeline which isn't known at compile-time and may actually
> be different on every invocation.
>
> I work around this in XML Calabash by encoding the multiple inputs
> and outputs into a single document. That works (sortof) for XProc 1.0
> because the documents all have to be XML. It won't work at all if
> we allow non-XML documents.
>
> (No, a sequence of inputs and outputs isn't sufficient because you
> have to be able to map sequences of inputs and outputs to different
> port names.)
>
>                                        Be seeing you,
>                                          norm
>
> --
> Norman Walsh
> Lead Engineer
> MarkLogic Corporation
> Phone: +1 413 624 6676
> www.marklogic.com
>



-- 
Innovimax SARL
Consulting, Training & XML Development
9, impasse des Orteaux
75020 Paris
Tel : +33 9 52 475787
Fax : +33 1 4356 1746
http://www.innovimax.fr
RCS Paris 488.018.631
SARL au capital de 10.000 €

Received on Monday, 30 April 2012 16:15:49 UTC