RE: p:output and connections from Vasil Rangelov on 2009-11-29 (public-xml-processing-model-comments@w3.org from November 2009)

From: Vasil Rangelov <boen.robot@gmail.com>
Date: Sun, 29 Nov 2009 22:52:19 +0200
To: <public-xml-processing-model-comments@w3.org>
Message-ID: <4b12df3b.100db80a.6423.59b2@mx.google.com>
> The only difference with respect to connections is that in the
> declaration of a step that is not atomic, the author can provide
> bindings for the inputs and outputs.

What is the term for "a step that is not atomic" in this context? It
certainly isn't "a contained step"... XProc doesn't provide a facility for
that. "a pipeline"? This is the thing that creates most confusion I think.
The very fact that "a step that is atomic" and "a step that is not atomic"
have certain common rules, and also have rules that apply to one, but not
the other. It isn't clear (in the spec) which applies to which, as very
often, the spec seems to use the term "atomic step" to refer to all kinds of
atomic steps.

I guess the very definition of an atomic step is misleading:
[Definition: An atomic step is a step that performs a unit of XML
processing, such as XInclude or transformation, and has no internal
subpipeline. ]

By this logic, regardless of whether a step is specification, implementation
or user defined, it's atomic if it doesn't have a subpipeline when it's
called.

Or perhaps it's a difference between "declaration of atomic step" and
"calling of an atomic step"... you may be declaring a pipeline, but you're
calling an atomic step... this difference shouldn't exist. One should be
calling what one declares.

I would like to point out another example where this creates a problem, but
for the most part, there isn't a problem if standard, extension and user
defined atomic steps are treated in the same fashion, and therefore, if a
single term is used to refer to all of them. err:XS0029 is the only example
I have so far for a case where this creates confusion and a potential for
misinterpretation... I mean, I now know what was the intention, but that
doesn't make the formulation OK.

Regards,
Vasil Rangelov

-----Original Message-----
From: public-xml-processing-model-comments-request@w3.org
[mailto:public-xml-processing-model-comments-request@w3.org] On Behalf Of
Norman Walsh
Sent: Monday, November 09, 2009 3:40 PM
To: public-xml-processing-model-comments@w3.org
Subject: Re: p:output and connections

"Vasil Rangelov" <boen.robot@gmail.com> writes:
> Hmmm... so... let me get this straight... when p:declare-step has a 
> sub pipeline, it declares a pipeline, and it is said that a user 
> really "calls a pipeline". When p:declare-step doesn't have a sub 
> pipeline, it declares an "atomic extension step" and the user "calls an
atomic extension step".

I think it's less complicated than that. In this context, whether something
is an extension step or not is irrelevant, so let's set that to one side for
a moment.

A p:declare-step declares a step. If that declaration includes a body, then
the body is the definition of the step. If the declaration is empty, then
the definition of the step is known to the processor through some other
means.

In either case, the declaration declares a type of step.

If a user uses places an element whose QName is the same as the name of that
type of step in a subpipeline, then the processing defined by that step is
performed.

So:

  <p:declare-step type="ex:my-first-step">
    <p:input port="source"/>
    <p:output port="result"/>
  </p:declare-step>

declares a step (that happens to be atomic) of the type "ex:my-first-step".

  <p:declare-step type="ex:my-second-step">
    <p:input port="source"/>
    <p:output port="result"/>
    <p:identity/>
  </p:declare-step>

declares a step (that happens not to be atomic) of type type
"ex:my-second-step".

I can use these steps in my pipelines without concern for whether they are
atomic or not, their *use* is *always* "atomic" in the sense that it never
contains a body:

  <p:pipeline>
    <ex:my-first-step/>
    <ex:my-second-step/>
  </p:pipeline>

> Rules
> that apply to atomic (extension or built in) steps don't apply to 
> pipelines, and vice-versa...

The only difference with respect to connections is that in the declaration
of a step that is not atomic, the author can provide bindings for the inputs
and outputs.

In the case of inputs, these connections are used if no explicit or implicit
binding is provided. In the case of outputs, these connections define what
the output of the step will be when it's called.

Consider:

  <p:declare-step type="ex:my-third-step">
    <p:input port="source" primary="true">
      <p:inline><doc1/></p:inline>
    </p:input>
    <p:input port="secondary" primary="false">
      <p:inline><doc2/></p:inline>
    </p:input>
    <p:output port="result" primary="true"/>
    <p:output port="result2" primary="false">
      <p:inline><doc3/></p:inline>
    </p:input>
    <p:identity/>
  </p:declare-step.

And here's where it's used:

   ...
   <p:identity/>
   <ex:my-third-step/>

When ex:my-third-step runs, its primary input port "source" will be bound to
the default readable port ("result" from the identity step), so the default
input "<doc1/>" will not be used.

There isn't a binding for the "secondary" input, so it will be bound to
"<doc2/>".

The "result" output has no binding, so it will be bound to the default
readable port of the last step, another identity step in this case.

The "result2" output will be bound to "<doc3/>".

If you did this:

  <p:declare-step type="ex:my-third-step">
    <p:input port="source" primary="true">
      <p:inline><doc1/></p:inline>
    </p:input>
    <p:input port="secondary" primary="false">
      <p:inline><doc2/></p:inline>
    </p:input>
    <p:output port="result" primary="true">
      <p:inline><doc4/></p:inline>
    </p:output>
    <p:output port="result2" primary="false">
      <p:inline><doc3/></p:inline>
    </p:input>
    <p:identity/>
  </p:declare-step>

You'd get an error because the primary output of the last step is
unconnected. You coudld do this:

  <p:declare-step type="ex:my-third-step">
    <p:input port="source" primary="true">
      <p:inline><doc1/></p:inline>
    </p:input>
    <p:input port="secondary" primary="false">
      <p:inline><doc2/></p:inline>
    </p:input>
    <p:output port="result" primary="true">
      <p:inline><doc4/></p:inline>
    </p:output>
    <p:output port="result2" primary="false">
      <p:inline><doc3/></p:inline>
    </p:input>
    <p:identity/>
    <p:sink/>
  </p:declare-step>

But that's a pretty pointless pipeline.

If you're declaring a step that isn't atomic, you can't put any sort of
default bindings in the inputs or outputs because you don't have any
visibility into what the step does.

> but aren't there some rules that apply to both atomic steps and 
> pipelines? If so, is there a common term to refer to them both (I 
> don't remember ever seeing a phrase like "atomic steps or pipelines")? 
> Where do "atomic extension steps" fit into this? Do they get the rules 
> for pipelines or for "atomic steps" (standard library wise)? The spec 
> currently appears to first define them separately (as if they are 
> something truly "special"), and then goes on by using only the term
"atomic steps".

All atomic steps are the same. The only things that are special about atomic
steps in the XProc namespace is that you can't declare any and if you
specify version > 1.0, some otherwise static errors are ignored.

In all other respects, they're just ordinary steps.

>> For atomic steps, there is no default, the step produces what it
produces.
>> It's a black box.
>
> Right... for atomic steps in the standard library and p:declare-step 
> of an
> (extension) atomic step. And for p:declare-step of a pipeline? What 
> happens if there is p:output with no connection in that case? Is THAT 
> an error, or do you always get some kind of a default connection (like
p:empty)?

An unconnected, non-primary output port in the declaration of a non-atomic
step produces an empty sequence. I don't think of that in terms of having a
default connection to p:empty so much as simply not having anything to read
*from*.

Remember, somewhat counter-intuitively, that *inside* the declaration of a
non-atomic step, the connections inside p:outputs are *reading* data, not
writing it. Whatever they *read* gets *written to* the output that's seen by
the pipeline that invokes the declared step.

  <p:pipeline>
    ...
    <p:declare-step type="ex:foo">
      <p:output port="result">
        <p:pipe step="whatever" port="result"/>
      </p:output>
      <ex:something name="whatever"/>
    </p:declare-step>

    <ex:foo/>
    <p:identity/>
  </p:pipeline>

When "ex:foo" is invoked, the processor begins running the declared
subpipline.

The ex:something step is run. The "result" output of ex:something is *read
by* the connection declared in the ex:foo declaration and *written to* the
"result" output port that's seen by the p:identity step that follows the
call to ex:foo.

Output ports in non-atomic steps have this weird dual role that they read
stuff (usually but not necessarily) from other steps in the subpipeline of
their container and write stuff to the outputs of that container.

>> For compound steps, there is no default, but if you leave them 
>> unconnected
> then they will produce an empty sequence.
>
> If you leave them unconnected (i.e. don't specify anything as a 
> connection), isn't what they get called a "default" connection 
> (p:empty in this case)? If so, aren't you contradicting yourself? Anyway,
I see... p:empty it is.

The result is the same as if they were bound to p:empty, so I'm not sure it
much matters how we think about it.

I think in terms of the dual role described above. If there's no connection,
then there's nothing for the output to read from. If there's nothing for it
to read from, then it has nothing to write to the output port.

                                        Be seeing you,
                                          norm

--
Norman Walsh <ndw@nwalsh.com> | So, are you working on finding that bug
http://nwalsh.com/            | now, or are you leaving it until later?
                              | Yes.
Received on Sunday, 29 November 2009 20:53:55 UTC