comments on XProc last-call draft from Nikolay Fiykov on 2007-10-16 (public-xml-processing-model-comments@w3.org from October 2007)

From: Nikolay Fiykov <nikolay.fiykov@nsn.com>
Date: Tue, 16 Oct 2007 18:22:20 +0300
To: public-xml-processing-model-comments@w3.org
Message-ID: <4714D72C.9020302@nsn.com>
Hi,

1) Definitions: "2. Pipeline concepts" : definition of subpipeline is
way too late, after being referenced in several other places.
This was a rather troubling experience for first time spec readers.

2) Editorial: Example 1, 4 (and possibly other) features step named like
actual p: tags ("pipeline").
Steps inputs and outputs are all names "source" and "result".
I found it rather confusing, at least not until reading almost entire
spec. Naming them
like "main", "xslt-input" and etc. would safe quite some confusion.

3) Definitions: Definitions of "containers" and "ancestors" is not very
clear, given the fact that "ancestor" is not defined at all.

4) Typo: "4.1 p:pipeline" --> "... when the it has ..."

5) Document model: in several places (p:for-each,p:viewport ant etc.)
term "document node" is used.
This suggests DOM Document object, right?
If so, what would be the way to execute pipelines against large
documents? If not, what exactly is to be understood?

6) Parallel subpipelines: As illustrated by the example for "p:for-each".
I find it rather hard to trace the individual execution branches (linear
executions), especially if I add few more steps inside.
Although I can use "p:group" or pipeline libraries, I think non-linear
pipelines have to be governed by a special construct.
Also I can not find anything said in the spec about how the parallel
branches will be executed: linear or in parallel.
This is critical for processing (large) streams of data, where many
small steps are involved and the stream cannot
be read multiple times (but only once).
I have several use cases (very important) where single input document
would have to be processed by parallel pipelines and their results
merged back together. For this, current idea is to use XProc to govern
the overall data flow and multiple XSLTs steps (able to process in
streaming mode) to perform the atomic operations.
All this can be properly examined only if parallelism is explicitly
present in the grammar.
Finally, having special "p:parallel" or such construct would allow for a
more clear and narrow interpretation of the spec.

7) "p:try/catch": Any particular reason why "p:finally" is not part of
the construct? This is well know paradigm and
missing "finally" is a bit confusing at first sight.

8) "p:serialize" : I'd happy to see also "exclude-prefixes" (after XSLT).

9) "p:pipe" : Aren't there too many pipe names mentioned: pipeline,
subpipeline, pipeline libraries, p:pipeline, p:pipeline-library, p:pipe.
For example "p:connect" or "p:bind" would do much better.

10) "p:document" : Nothing said about document stability. Are consequent
executions allowed to return different documents or
they have to be guaranteed to be the same (like XSLT)?

11) I'd appreciate if you publish XML Schemas for the results of
following steps: p:count, p:directory-list, p:http-request, p:p:store

12) "p:directory-list" : option name="path" : "the value of the path
must be an anyURI". Why it is not names "uriPath" then?

13) "p:directory-lsit" : option filter : I have use cases where I'd need
to do directory scanning where single RegExp alone would not be enough.
Any possibility to have "includes" and "excludes" (from Ant) added/instead?

14) Namespace rename: "2. Each response header ... is translated into
c:header element". Short or long notation?

15) XSLT 2.0 : "If a sequence of documents is provided on the source
port ...".
Not clear to me how sequences are to be handled exactly.

16) XSLT 2.0 and "p:parameter" : passing of documents is impossible
(only strings).
There are cases when result of one transformation is needed in second.
I'm not sure we need all the "overhead"
of wrapping/unwrapping or similar to achieve that.

17) Steps evaluation: "A pipeline must behave as if it evaluated each
step each time it occurs." :
How XSLT templates caching can be achieved and at the same time be
complaint with the spec?
And without templates caching there will be significant performance
penalty when using pipelines.
Document stability has a role in this subject too.

18) On ability to process large documents:
- there are multiple places where XPath expressions are expected. These
steps cannot be executed against large steps
(without defining a subset of XPath).
- there are multiple places where "node sets" or wrapping "document
nodes" are required to be produced. These too cannot be executed against
large documents.
- there are only few steps (required and optional) which can
(potentially) operate on top of large documents (and thus perhaps using
object models based on SAX).
There is no explanation as how intermixing of steps with different
underlying models is to be achieved.

19) "p:label-element" : scheme="count-elements" : Why is not valid XPath
expression allowed here?
In my case I'd use generate-id() for adding missing id attributes.

BR, Nikolay Fiykov
Received on Tuesday, 16 October 2007 15:31:10 UTC