Re: select in for-each and iteration-source from Innovimax SARL on 2007-06-06 (public-xml-processing-model-wg@w3.org from June 2007)

From: Innovimax SARL <innovimax@gmail.com>
Date: Wed, 6 Jun 2007 23:44:59 +0200
To: "Norman Walsh" <ndw@nwalsh.com>
Cc: public-xml-processing-model-wg@w3.org
Message-ID: <546c6c1c0706061444s86ea42di9bc980ec5a615a7a@mail.gmail.com>
On 6/6/07, Norman Walsh <ndw@nwalsh.com> wrote:
> Thanks for the reminder about this message, Mohamed. It was in my
> queue but somehow I overlooked it.
>
> / Innovimax SARL <innovimax@gmail.com> was heard to say:
> | I assume in the rest of the email, that select in for-each does give
> | access to all node matching the select expression and not only the
> | outermost
>
> Yes, that's the semantic of an XPath select. And that's the same
> semantic that you get with select on input:
>
> <?xml version="1.0"?>
> <p:pipeline name="pipeline"
>             xmlns:p="http://www.w3.org/2007/03/xproc"
>             xmlns:px="http://xproc.org/2007/03/xproc/ex">
> <p:output port="result"/>
>
> <p:for-each name="for-each">
>   <p:iteration-source select="//div">
>     <p:inline>
>       <doc>
>         <div>div 1</div>
>         <div>div 2
>           <div>div 2.1</div>
>         </div>
>       </doc>
>     </p:inline>
>     <p:inline>
>       <doc>
>         <div>div 1</div>
>       </doc>
>     </p:inline>
>   </p:iteration-source>
>   <p:output port="result"/>
>
>   <p:wrap-sequence>
>     <p:option name="wrapper" value="document"/>
>   </p:wrap-sequence>
> </p:for-each>
>
> <p:wrap-sequence>
>   <p:option name="wrapper" value="doclist"/>
> </p:wrap-sequence>
>
> </p:pipeline>
>
> returns:
>
> <?xml version="1.0" encoding="UTF-8"?><doclist><document><div>div 1</div>
> </document>
> <document><div>div 2
>           <div>div 2.1</div>
>         </div>
> </document>
> <document><div>div 2.1</div>
> </document>
> <document><div>div 1</div>
> </document>
> </doclist>
>
> | [Note en passant : it means that in a for-each the same node could be
> | provided multiple times]
>
> Yes.
>
> | Let's say we have a sequence in step="generate-sequence" port="result"
> |
> | Here are multiple way to access to doc elements in this sequence
> | In this sequence we would have all use cases :
> | a. some documents in the sequence do not contains any doc element
> | b. some documents in the sequence do contains doc that are not
> | themselves descendant of a doc element
> | c. some documents in the sequence do contains doc that are descendant
> | of doc element
> |
> | The goal is to apply the same select at different places to see what
> | are the difference . I know it is totally arbitrary but it gives a
> | opportunity to be sure of what we mean by selecting at those level
>
> Right.
>
> | Here are the different classes of construct
> |
> | == simple case ==
> |
> | [S1]
> | <p:for-each select="//doc">
> |  <p:iteration-source>
> |    <p:pipe step="generate-sequence" port="result"/>
> |  </p:iteration-source>
> |  subpipeline
> | </p:for-each>
> |
> | it would give to the subpipeline all the doc elements dispatched in
> | the sequence (b and c)
>
> I agree.
>
> | [S2]
> | <p:for-each>
> |  <p:iteration-source select="//doc">
> |    <p:pipe step="generate-sequence" port="result"/>
> |  </p:iteration-source>
> |  subpipeline
> | </p:for-each>
> |
> | it would give to the subpipeline all the doc elements that are not b (only c).
> |
> | [Note en passant : I assume that select on
> | p:input/p:iteration-source/etc. throw out empty matchs, please confirm
> | !]
>
> No, I don't believe that's the case. If it was, then 'select' wouldn't
> have the semantics of an XSLT 'select' and I'd want to change the
> attribute's name.
>
> We decided to use 'match' on p:viewport where it's really important
> not to process elements more than once. Everywhere else where we've
> used select=, I assume we're using the standard XPath/XSLT select
> semantics.

I agree with your analysis but I'm sorry : THAT'S NOT WHAT IS WAS
WRITTEN IN THE SPEC !

[[
The select expression, if specified, applies the specified [XPath 1.0]
select expression to the document(s) that are read. Each node that
matches is wrapped in a document and provided to the input port. After
a node has been matched, its descendants are not considered for
further matching; a node is passed at most once as input. In other
words,

<p:input port="source">
  <p:document href="http://example.org/input.html"/>
</p:input>

provides a single document, but

<p:input port="source" select="//html:div">
  <p:document href="http://example.org/input.html"/>
</p:input>

provides a sequence of zero or more documents, one for each matching
html:div (that is not itself a descendant of an html:div) in
http://example.org/input.html.

A select expression can equally be applied to input read from another
step. This input:

<p:input port="source" select="//html:div">
  <p:pipe step="origin" port="result"/>
</p:input>

provides a sequence of zero or more documents, one for each matching
html:div in the document (or each of the documents) that is read from
the portname port of the step named origin.

In contexts where a binding is required, an empty p:input is bound to
an empty sequence of documents.
]]




>
> | == Nested case ==
> |
> | [N4]
> | <p:for-each name="outer">
> |  <p:iteration-source>
> |    <p:pipe step="generate-sequence" port="result"/>
> |  </p:iteration-source>
> |  <!-- here we can get information and put them in options -->
> |  <p:for-each select="//doc">
> |    <p:iteration-source>
> |      <p:pipe step="outer" port="current"/>
> |    </p:iteration-source>
> |    subpipeline
> |  </p:for-each>
> | </p:for-each>
> |
> | With this one the sequence is splitted document by document (wether or
> | not they contain doc) and then S1 is applied
> | so you get [S1]+[context on each doc of the sequence included a) ]
>
> I think that's right.
>
> | [Note en passant : do we have a way to know that the inner p:for-each
> | had match anything in case of a) ?]
>
> You could find out by running the results through p:count. (Or running
> the input through it, if you wanted.)
>
> | [N5]
> | <p:for-each name="outer" select="//doc">
> |  <p:iteration-source>
> |    <p:pipe step="generate-sequence" port="result"/>
> |  </p:iteration-source>
> |  <!-- here we can get information and put them in options -->
> |  <p:for-each select="//doc">
> |    <p:iteration-source>
> |      <p:pipe step="outer" port="current"/>
> |    </p:iteration-source>
> |    subpipeline
> |  </p:for-each>
> | </p:for-each>
> |
> | This one starts to be interesting
> | from the outer one you get all the doc element (b and c)
> | And in the inner loop you resplit
>
> Right, in the outer loop you do b and c and then in the inner loop you
> do ... b and c again.
>
> | [N6]
> | <p:for-each name="outer">
> |  <p:iteration-source select="//doc">
> |    <p:pipe step="generate-sequence" port="result"/>
> |  </p:iteration-source>
> |  <!-- here we can get information and put them in options -->
> |  <p:for-each select="//doc">
> |    <p:iteration-source>
> |      <p:pipe step="outer" port="current"/>
> |    </p:iteration-source>
> |    subpipeline
> |  </p:for-each>
> | </p:for-each>
>
> I think this is exactly the same as N5 for the reasons I gave above.
>
> | This one, you will get a sequence of doc that have no doc ancestor
> | And you apply a for each
> | so you will get [S1]+[context of doc in the original sequence but a) removed ]
> |
> | [N8]
> | <p:for-each name="outer">
> |  <p:iteration-source>
> |    <p:pipe step="generate-sequence" port="result"/>
> |  </p:iteration-source>
> |  <!-- here we can get information and put them in options -->
> |  <p:for-each>
> |    <p:iteration-source select="//doc">
> |      <p:pipe step="outer" port="current"/>
> |    </p:iteration-source>
> |    subpipeline
> |  </p:for-each>
> | </p:for-each>
> |
> | With this one the sequence is splitted document by document (wether or
> | not they contain doc) and then S2 is applied
> | so you get [S2]+[context on each doc of the sequence included a) ]
>
> Here you get b and c for each document, so I think that's just b and c.
>
> | Are those interpretations correct ?
>
> I'm not sure, but some of them are different from mine :-)
>

Mohamed

-- 
Innovimax SARL
Consulting, Training & XML Development
9, impasse des Orteaux
75020 Paris
Tel : +33 8 72 475787
Fax : +33 1 4356 1746
http://www.innovimax.fr
RCS Paris 488.018.631
SARL au capital de 10.000 €
Received on Wednesday, 6 June 2007 21:45:04 UTC