- From: Norman Walsh <ndw@nwalsh.com>
- Date: Thu, 26 Apr 2012 09:31:41 -0400
- To: public-xml-processing-model-wg@w3.org
- Message-ID: <m2ehrar54y.fsf@nwalsh.com>
Per my action from last week... Part of my plan for (re)implementing my XProc processor involves performing more aggressive graph analysis. This has two benefits: first, I'll be able to establish thread boundaries and do multi-threaded processing and second, I'll be able to identify (sub)pipelines that can be streamed. In order to make the graph more amenable to this sort of streaming and rewriting, I'm transforming the user's pipeline into something with explicit steps for actions like splitting. Consider this pipeline fragment: <p:identity name="root"/> <p:identity name="branch1"> <p:input port="source"> <p:pipe step="root" port="result"/> </p:input> </p:identity> <p:identity name="branch2"> <p:input port="source"> <p:pipe step="root" port="result"/> </p:input> </p:identity> The two identity steps branch1 and branch2 both read from the same "result" port on the "root" step. At an implementation level that requires some sort of buffering or copying. I want to make that explicit, so I'm introducing an explicit split step: <p:identity name="root"/> <internal:split name="ID00001"> <p:identity name="branch1"> <p:input port="source"> <p:pipe step="ID00001" port="result1"/> </p:input> </p:identity> <p:identity name="branch2"> <p:input port="source"> <p:pipe step="ID00001" port="result2"/> </p:input> </p:identity> So what's the declaration for the internal:split step? It's something like this: <p:declare-step type="internal:split"> <p:input port="source" sequence="true" primary="true"/> <p:output port="result1" sequence="true" primary="false"/> <p:output port="result2" sequence="true" primary="false"/> </p:declare-step> And I could declare internal:split2, internal:split3, etc. steps. But really this is just a magic step with an arbitrary number of output ports. The same problem exists if you want to write an eval step: <p:declare-step type="cx:eval"> <p:input port="pipeline"/> <p:input port="source" sequence="true"/> <p:input port="options"/> <p:output port="result"/> <p:option name="step" cx:type="xsd:QName"/> <p:option name="detailed" cx:type="xsd:boolean"/> </p:declare-step> This is a step that takes *an XML pipeline document* as it's input, compiles it, and runs it. The problem, of course, is that the number of inputs and outputs that this step needs is determined entirely by the input pipeline which isn't known at compile-time and may actually be different on every invocation. I work around this in XML Calabash by encoding the multiple inputs and outputs into a single document. That works (sortof) for XProc 1.0 because the documents all have to be XML. It won't work at all if we allow non-XML documents. (No, a sequence of inputs and outputs isn't sufficient because you have to be able to map sequences of inputs and outputs to different port names.) Be seeing you, norm -- Norman Walsh Lead Engineer MarkLogic Corporation Phone: +1 413 624 6676 www.marklogic.com
Received on Thursday, 26 April 2012 13:32:19 UTC