[closed] Re: comments on XProc last-call draft from Norman Walsh on 2008-01-25 (public-xml-processing-model-comments@w3.org from January 2008)

From: Norman Walsh <ndw@nwalsh.com>
Date: Fri, 25 Jan 2008 10:09:56 -0500
To: public-xml-processing-model-comments@w3.org
CC: nikolay.fiykov@nsn.com
Message-ID: <m24pd2vt4r.fsf@nwalsh.com>
Nikolay,

Thank you for your comments. I believe that many have been addressed.
I'm going to close this thread and open a new one for a few that I
think we might address before we go to another Last Call (8, 12, 19).
If you're unsatisfied with any of the resolutions, please feel free to
raise them again.

/ Nikolay Fiykov <nikolay.fiykov@nsn.com> was heard to say:
| 1) Definitions: "2. Pipeline concepts" : definition of subpipeline is
| way too late, after being referenced in several other places.
| This was a rather troubling experience for first time spec readers.

I think this has been improved.

| 2) Editorial: Example 1, 4 (and possibly other) features step named like
| actual p: tags ("pipeline").
| Steps inputs and outputs are all names "source" and "result".
| I found it rather confusing, at least not until reading almost entire
| spec. Naming them
| like "main", "xslt-input" and etc. would safe quite some confusion.

We've tried to use source/result consistently. I added an explanation
of that. Regarding the names of pipelines, I think the recent syntax
changes alleviate some of those concerns.

| 3) Definitions: Definitions of "containers" and "ancestors" is not very
| clear, given the fact that "ancestor" is not defined at all.

Ok. I've tried to improve that.

| 4) Typo: "4.1 p:pipeline" --> "... when the it has ..."

Fixed.

| 5) Document model: in several places (p:for-each,p:viewport ant etc.)
| term "document node" is used.
| This suggests DOM Document object, right?
| If so, what would be the way to execute pipelines against large
| documents? If not, what exactly is to be understood?

I think we've clarified that.

| 6) Parallel subpipelines: As illustrated by the example for "p:for-each".
| I find it rather hard to trace the individual execution branches (linear
| executions), especially if I add few more steps inside.
| Although I can use "p:group" or pipeline libraries, I think non-linear
| pipelines have to be governed by a special construct.
| Also I can not find anything said in the spec about how the parallel
| branches will be executed: linear or in parallel.
| This is critical for processing (large) streams of data, where many
| small steps are involved and the stream cannot
| be read multiple times (but only once).
| I have several use cases (very important) where single input document
| would have to be processed by parallel pipelines and their results
| merged back together. For this, current idea is to use XProc to govern
| the overall data flow and multiple XSLTs steps (able to process in
| streaming mode) to perform the atomic operations.
| All this can be properly examined only if parallelism is explicitly
| present in the grammar.
| Finally, having special "p:parallel" or such construct would allow for a
| more clear and narrow interpretation of the spec.

We've been content so far to leave the question of whether steps are
executed in a parallel or serial fashion as an implementation issue.
Can you provide an example where user-level control over this behavior
is necessary?

| 7) "p:try/catch": Any particular reason why "p:finally" is not part of
| the construct? This is well know paradigm and
| missing "finally" is a bit confusing at first sight.

I can't imagine how a p:finally could be specified. The nature of
XProc doesn't leave any "cleanup" to be performed after an exception.
The processor is responsible for all the cleanup.

| 8) "p:serialize" : I'd happy to see also "exclude-prefixes" (after XSLT).

Can you live without it? :-)

| 9) "p:pipe" : Aren't there too many pipe names mentioned: pipeline,
| subpipeline, pipeline libraries, p:pipeline, p:pipeline-library, p:pipe.
| For example "p:connect" or "p:bind" would do much better.

The situation has been simplified a little bit by renaming
p:pipeline-library to p:library.

| 10) "p:document" : Nothing said about document stability. Are consequent
| executions allowed to return different documents or
| they have to be guaranteed to be the same (like XSLT)?

We don't require stability. Pipelines that need to gaurantee stability
can do so with a p:identity step. We're drafting text to make that
clearer now.

| 11) I'd appreciate if you publish XML Schemas for the results of
| following steps: p:count, p:directory-list, p:http-request, p:p:store

Right. Will do.

| 12) "p:directory-list" : option name="path" : "the value of the path
| must be an anyURI". Why it is not names "uriPath" then?

Mabe that would be a good idea.

| 13) "p:directory-lsit" : option filter : I have use cases where I'd need
| to do directory scanning where single RegExp alone would not be enough.
| Any possibility to have "includes" and "excludes" (from Ant) added/instead?

Done.

| 14) Namespace rename: "2. Each response header ... is translated into
| c:header element". Short or long notation?

Sorry, I don't understand this comment.

| 15) XSLT 2.0 : "If a sequence of documents is provided on the source
| port ...".
| Not clear to me how sequences are to be handled exactly.

The first document in the sequence is the primary input document and
the whole sequence because the XSLT default collection.

| 16) XSLT 2.0 and "p:parameter" : passing of documents is impossible
| (only strings).
| There are cases when result of one transformation is needed in second.
| I'm not sure we need all the "overhead"
| of wrapping/unwrapping or similar to achieve that.

I sympathize, but the WG has revisited this issue several times and
has not been persuaded to allow structured parameters. More's the
pity.

| 17) Steps evaluation: "A pipeline must behave as if it evaluated each
| step each time it occurs." :
| How XSLT templates caching can be achieved and at the same time be
| complaint with the spec?

I don't understand the question.

| And without templates caching there will be significant performance
| penalty when using pipelines.
| Document stability has a role in this subject too.
|
| 18) On ability to process large documents:
| - there are multiple places where XPath expressions are expected. These
| steps cannot be executed against large steps
| (without defining a subset of XPath).

That's not universally true. An implementation can examine the XPath
expressions actually provided and may be able to stream them.

The ability of a processor to handle large documents is viewed as a
quality of implementation issue.

| 19) "p:label-element" : scheme="count-elements" : Why is not valid XPath
| expression allowed here?
| In my case I'd use generate-id() for adding missing id attributes.

Yes, I suppose that's an option.

                                        Be seeing you,
                                          norm

-- 
Norman Walsh <ndw@nwalsh.com> | Everything should be made as simple as
http://nwalsh.com/            | possible, but no simpler.
Received on Friday, 25 January 2008 15:10:19 UTC