Re: A proposal to restructure our top-level syntax from Norman Walsh on 2007-12-12 (public-xml-processing-model-wg@w3.org from December 2007)

From: Norman Walsh <ndw@nwalsh.com>
Date: Wed, 12 Dec 2007 10:06:07 -0500
To: public-xml-processing-model-wg@w3.org
Message-ID: <m2r6hs9cm8.fsf@nwalsh.com>
/ ht@inf.ed.ac.uk (Henry S. Thompson) was heard to say:
| nature of p:pipeline.  Bear with me, but that triggered a thought
| which I want to try to work through: we should use p:declare-step to
| declare all steps. 

(Apologies if what follows is a bit rambling...)

I've gone back and forth on this proposal. There's a lot to like from
a software engineering perspective and it does seem to simplify some
parts of the language.

On first reading, I thought it pushed the design center of XProc too
far away from the "easy to use, scripting" space and towards the
"strongly typed, compile and run" space. If XProc becomes (or is
perceived to become) less like a dynamic language and more like a
traditional, "declare everything before you use it" language, we'll
alienate a lot of our audience.

On subsequent readings, it seems like maybe it does less of that than
I first thought. If the average man-on-the-street using XProc almost
exclusively writes one-off pipelines that aren't recursive, these
changes won't really be visible.

Users writing pipeline libraries will have to learn about this
declaration dance, but maybe they're already more sophisticated.

It will allow pipeline authors to modularize their pipelines "inline".

Migrating the inlined pipelines into external libraries will be
straightforward.

But...

| 1) Add optional subpipeline content to p:declare-step -- if it's
|    present, it's the definition:
| 
| <p:declare-step
|   type = QName>
|     (p:input |
|      p:output |
|      p:option)*
|      _subpipeline_
| </p:declare-step>

That's not really a subpipeline is it? It's a p:pipeline, no? If it's
a subpipeline then declare-step has to have a name so that the steps in
the subpipeline can refer to the declared ports. Or...what am I missing?

  <p:declare-step type="px:xslt10">
    <p:input port="source" primary="true"/>
    <p:input port="stylesheet"/>
    <p:output port="result"/>

    <p:xslt version="1.0">
       ... how do I make the bindings here...
    </p:xslt>
  </p:declare-step>

Or does declare-step have a name now so I can point to its inputs?

  <p:declare-step name="myxslt" type="px:xslt10">
    <p:input port="source" primary="true"/>
    <p:input port="stylesheet"/>
    <p:output port="result"/>

    <p:xslt version="1.0">
       <p:input port="source">
         <p:pipe step="myxslt" port="source"/>
       </p:input>
       <p:input port="stylesheet">
         <p:pipe step="myxslt" port="stylesheet"/>
       </p:input>
    </p:xslt>
  </p:declare-step>

Or, if it is a p:pipeline, then is this what I do?

  <p:declare-step type="px:xslt10">
    <p:input port="source" primary="true"/>
    <p:input port="stylesheet"/>
    <p:output port="result"/>

    <p:pipeline name="main">
      <p:xslt version="1.0">
        <p:input port="source">
          <p:pipe step="main" port="source"/>
        </p:input>
        <p:input port="stylesheet">
          <p:pipe step="main" port="stylesheet"/>
        </p:input>
      </p:xslt>
    </p:pipeline>
  </p:declare-step>

Where the internal p:pipeline doesn't have any input or output port
bindings? Or if I want to provide a default input, am I allowed to
put a binding there?

[...]
| (Not sure why p:log is there -- can/should be removed?)

It's there so that you can log the input/output to/from a pipeline.

| 5) Change the definition of p:pipe so that 'step' is optional, and if
|    omitted means the lexically inclosing p:pipeline.

This seems orthogonal. And if we're goint to reopen discussion of
making step and/or port optional on p:pipe, I have a different
proposal :-)

| I think this is actually a much cleaner design.  It puts all the load
| of defining typed steps on declare-step.  It will actually make moving
| From user-defined to implementation-defined a very smooth transition
| -- we could even say that the content of a p:declare-step is actually
| a fallback -- implementations can supply builtin-definitions if they
| have them.

Yes, I think that's a useful feature.

| I don't think Norm's other objection to my original 'require I/O decls
| only on libraries' proposal is as strong now -- to publish a pipeline,
| I just rename it to p:declare-step and wrap it in a
| p:pipeline-library, probably adding I/O declarations as well.  No
| internal changes are required.

Can I have a pipeline document that has p:declare-step as it's root
element? Can I import that directly?

| This also simplifies things in that p:import now _only_ imports
| libraries.  It will of course still be possible for an implementation
| to 'run' a library, defaulting to the first or only p:declare-step in
| the absence of a type argument.

We rejected a proposal to add an attribute to p:pipeline-library to
indicate the "default" pipeline to run. Did we decide that "the
first" was the default?

| We get the defaulting of p:pipeline I/O back, because every step has
| mandatory I/O declarations, so there's no problem answering the "does
| the last step have a declaration?" question.

Yep.

| One final note -- (1) above arguably oversimplifies, and so makes my
| assertion about how you publish a pipeline false, because as written
| it doesn't allow some things that p:pipeline does in its content.  I
| think on balance I'd prefer to be catholic about that, and so we should
| rather have
|
| <p:declare-step
|   type = QName>
|     (p:input |
|      p:output |
|      p:import |
|      p:declare-step |
|      p:serialization |
|      p:option)*
|      _subpipeline_
| </p:declare-step>

That's not really a subpipeline is it? It's a p:pipeline, no? If it's
a subpipeline then declare-step has to have a name so that the steps in
the subpipeline can refer to the declared ports.

| with the stipulation that a) nested declarations and imports are _not_
| visible higher up; b) serialization is only relevant if the step is
| 'run'.  Which points out another collateral benefit -- p:serialization
| really belongs on p:declare-step, because its perfectly coherent to
| ask an implementation to 'run' a step from a pipeline library w/o
| knowing or caring whether it's declared built-in or user-defined,
| that is, without or with an explicit subpipeline.

Interesting. This seems to shift the ground a fair bit. Consider:

<p:declare-step type="px:xslt-to-html">
  <p:input port="source" sequence="true" primary="true"/>
  <p:input port="stylesheet"/>
  <p:input port="parameters" kind="parameter" sequence="true"/>
  <p:output port="result" primary="true"/>
  <p:output port="secondary" sequence="true"/>
  <p:option name="initial-mode"/>
  <p:option name="template-name"/>
  <p:option name="output-base-uri"/>
  <p:option name="version"/>
  <p:serialization method="html" port="result"/>
  <p:xslt>
    ...however we resolved the binding questions above...
  </p:xslt>
</p:declare-step>

Now I can call px:xslt-to-html directly? And if I call it directly then
it uses the serialization options I provided? Can I call p:xslt directly,
without a pipeline wrapper? If not, why not?

| Thanks for listening :-)

Finally, I think the inability to import two simple pipelines (because
of the name clash) is a critical problem. I can think of some ways
around it, such as allowing href on p:declare-step, but...it makes thinks
a little more complex.


                                        Be seeing you,
                                          norm

-- 
Norman Walsh <ndw@nwalsh.com> | Reason's last step is the recognition
http://nwalsh.com/            | that there are an infinite number of
                              | things which are beyond it.-- Pascal
Received on Wednesday, 12 December 2007 15:07:01 UTC