Re: p:output and connections from Norman Walsh on 2009-11-09 (public-xml-processing-model-comments@w3.org from November 2009)

From: Norman Walsh <ndw@nwalsh.com>
Date: Mon, 09 Nov 2009 08:40:23 -0500
To: public-xml-processing-model-comments@w3.org
Message-ID: <m2d43rerzs.fsf@nwalsh.com>
"Vasil Rangelov" <boen.robot@gmail.com> writes:
> Hmmm... so... let me get this straight... when p:declare-step has a sub
> pipeline, it declares a pipeline, and it is said that a user really "calls a
> pipeline". When p:declare-step doesn't have a sub pipeline, it declares an
> "atomic extension step" and the user "calls an atomic extension step".

I think it's less complicated than that. In this context, whether
something is an extension step or not is irrelevant, so let's set that
to one side for a moment.

A p:declare-step declares a step. If that declaration includes a body,
then the body is the definition of the step. If the declaration is
empty, then the definition of the step is known to the processor
through some other means.

In either case, the declaration declares a type of step.

If a user uses places an element whose QName is the same as the name
of that type of step in a subpipeline, then the processing defined by
that step is performed.

So:

  <p:declare-step type="ex:my-first-step">
    <p:input port="source"/>
    <p:output port="result"/>
  </p:declare-step>

declares a step (that happens to be atomic) of the type
"ex:my-first-step".

  <p:declare-step type="ex:my-second-step">
    <p:input port="source"/>
    <p:output port="result"/>
    <p:identity/>
  </p:declare-step>

declares a step (that happens not to be atomic) of type type
"ex:my-second-step".

I can use these steps in my pipelines without concern for whether they
are atomic or not, their *use* is *always* "atomic" in the sense that
it never contains a body:

  <p:pipeline>
    <ex:my-first-step/>
    <ex:my-second-step/>
  </p:pipeline>

> Rules
> that apply to atomic (extension or built in) steps don't apply to pipelines,
> and vice-versa...

The only difference with respect to connections is that in the
declaration of a step that is not atomic, the author can provide
bindings for the inputs and outputs.

In the case of inputs, these connections are used if no explicit or
implicit binding is provided. In the case of outputs, these
connections define what the output of the step will be when it's
called.

Consider:

  <p:declare-step type="ex:my-third-step">
    <p:input port="source" primary="true">
      <p:inline><doc1/></p:inline>
    </p:input>
    <p:input port="secondary" primary="false">
      <p:inline><doc2/></p:inline>
    </p:input>
    <p:output port="result" primary="true"/>
    <p:output port="result2" primary="false">
      <p:inline><doc3/></p:inline>
    </p:input>
    <p:identity/>
  </p:declare-step.

And here's where it's used:

   ...
   <p:identity/>
   <ex:my-third-step/>

When ex:my-third-step runs, its primary input port "source" will be bound
to the default readable port ("result" from the identity step), so the
default input "<doc1/>" will not be used.

There isn't a binding for the "secondary" input, so it will be bound to
"<doc2/>".

The "result" output has no binding, so it will be bound to the default
readable port of the last step, another identity step in this case.

The "result2" output will be bound to "<doc3/>".

If you did this:

  <p:declare-step type="ex:my-third-step">
    <p:input port="source" primary="true">
      <p:inline><doc1/></p:inline>
    </p:input>
    <p:input port="secondary" primary="false">
      <p:inline><doc2/></p:inline>
    </p:input>
    <p:output port="result" primary="true">
      <p:inline><doc4/></p:inline>
    </p:output>
    <p:output port="result2" primary="false">
      <p:inline><doc3/></p:inline>
    </p:input>
    <p:identity/>
  </p:declare-step>

You'd get an error because the primary output of the last step is
unconnected. You coudld do this:

  <p:declare-step type="ex:my-third-step">
    <p:input port="source" primary="true">
      <p:inline><doc1/></p:inline>
    </p:input>
    <p:input port="secondary" primary="false">
      <p:inline><doc2/></p:inline>
    </p:input>
    <p:output port="result" primary="true">
      <p:inline><doc4/></p:inline>
    </p:output>
    <p:output port="result2" primary="false">
      <p:inline><doc3/></p:inline>
    </p:input>
    <p:identity/>
    <p:sink/>
  </p:declare-step>

But that's a pretty pointless pipeline.

If you're declaring a step that isn't atomic, you can't put any sort
of default bindings in the inputs or outputs because you don't have
any visibility into what the step does.

> but aren't there some rules that apply to both atomic
> steps and pipelines? If so, is there a common term to refer to them both (I
> don't remember ever seeing a phrase like "atomic steps or pipelines")? Where
> do "atomic extension steps" fit into this? Do they get the rules for
> pipelines or for "atomic steps" (standard library wise)? The spec currently
> appears to first define them separately (as if they are something truly
> "special"), and then goes on by using only the term "atomic steps".

All atomic steps are the same. The only things that are special about
atomic steps in the XProc namespace is that you can't declare any and
if you specify version > 1.0, some otherwise static errors are ignored.

In all other respects, they're just ordinary steps.

>> For atomic steps, there is no default, the step produces what it produces.
>> It's a black box.
>
> Right... for atomic steps in the standard library and p:declare-step of an
> (extension) atomic step. And for p:declare-step of a pipeline? What happens
> if there is p:output with no connection in that case? Is THAT an error, or
> do you always get some kind of a default connection (like p:empty)?

An unconnected, non-primary output port in the declaration of a
non-atomic step produces an empty sequence. I don't think of that in
terms of having a default connection to p:empty so much as simply not
having anything to read *from*.

Remember, somewhat counter-intuitively, that *inside* the declaration
of a non-atomic step, the connections inside p:outputs are *reading*
data, not writing it. Whatever they *read* gets *written to* the
output that's seen by the pipeline that invokes the declared step.

  <p:pipeline>
    ...
    <p:declare-step type="ex:foo">
      <p:output port="result">
        <p:pipe step="whatever" port="result"/>
      </p:output>
      <ex:something name="whatever"/>
    </p:declare-step>

    <ex:foo/>
    <p:identity/>
  </p:pipeline>

When "ex:foo" is invoked, the processor begins running the declared
subpipline.

The ex:something step is run. The "result" output of ex:something is
*read by* the connection declared in the ex:foo declaration and
*written to* the "result" output port that's seen by the p:identity
step that follows the call to ex:foo.

Output ports in non-atomic steps have this weird dual role that they
read stuff (usually but not necessarily) from other steps in the
subpipeline of their container and write stuff to the outputs of that
container.

>> For compound steps, there is no default, but if you leave them unconnected
> then they will produce an empty sequence.
>
> If you leave them unconnected (i.e. don't specify anything as a connection),
> isn't what they get called a "default" connection (p:empty in this case)? If
> so, aren't you contradicting yourself? Anyway, I see... p:empty it is.

The result is the same as if they were bound to p:empty, so I'm not
sure it much matters how we think about it.

I think in terms of the dual role described above. If there's no
connection, then there's nothing for the output to read from. If
there's nothing for it to read from, then it has nothing to write to
the output port.

                                        Be seeing you,
                                          norm

-- 
Norman Walsh <ndw@nwalsh.com> | So, are you working on finding that bug
http://nwalsh.com/            | now, or are you leaving it until later?
                              | Yes.
Received on Monday, 9 November 2009 13:41:13 UTC