- From: Nikolay Fiykov <nikolay.fiykov@nsn.com>
- Date: Fri, 26 Oct 2007 17:14:09 +0300
- To: jeni@jenitennison.com
- CC: public-xml-processing-model-comments@w3.org, w3c-xsl-wg@w3.org
Hi Jeni, > 3. The XProc specification does not make it clear if parallel executions > are handled. (Currently there is implicit parallelism based on connection > between steps.) This would be a problem for any task involving multiple > processing steps on top of streams. > >I don't understand this point (probably someone else on the XProc WG will, but I'll ask anyway). Can you (or anyone) expand, perhaps >with an example? > Currently XSL WG is working on streaming transformations, everything related to large or infinite input XML documents, memory and time constraints. From the list of use cases we've gathered so far, several of them can be addressed by combined use of pipelining and transformations. Here is one such example. Given the input "<root> <A/> <B/> <A/> <B/> ... </root>", produce two output documents where each contains A or B only : "<root> <A/> <A/> ... </root>" and "<root> <B/> <B/> ... </root>". This can be solved easily with two similar stylesheets filtering out A and B respectively. A pipeline can be used in conjunction with XSLT to facilitate their execution. For example we can use XProc with a pipeline modeled after "Example 5 A Sample For-Each". The catch is that the input is so big that it cannot fit into memory. Also, we have to operate with the assumption that it is readable only once i.e. it is a single pass stream feed. Technically this can be done only if we assume that XProc's processor implements XML Documents as XML-events (not DOM) and that both transformations will receive input events simultaneously. Now, the spec is flexible enough about what an XML document is: "What flows between steps are exclusively XML documents. The inputs and outputs can be implemented as sequences of characters, sequences of events, object models, or any other representation that the implementation chooses." There is also a guidance as to how (essentially linear) execution should happen: "The result of evaluating a pipeline is the result of evaluating the steps that it contains, in the order determined by the connections between them. A pipeline must behave as if it evaluated each step each time it occurs." What the spec lacks completely though is how parallel branches are to be handled. By "parallel branches" I mean the one defined by "connection between ports", not the conditional one. I'd argue that this is not entirely for implementors to choose as essentially depending on the strategy, we may have different end results for one and same pipeline. In this case we can receive either both stylesheets results (if events are distributed simultaneously) or only one of them (if the implementation executes each xslt as a separate step) (second will be empty because the first step already consumed the stream). Further on, there are similar questions as to how to merge results from parallel executions. We have few other use cases where stream merging (combining) would be needed. The spec has got nothing about such cases either. So, did that long explanation helped for our understanding now or more is needed? Cheers, Nikolai
Received on Friday, 26 October 2007 14:14:28 UTC