The first five minutes ... a thought experiment (long) from James Fuller on 2014-02-17 (xproc-dev@w3.org from February 2014)

From: James Fuller <jim@webcomposite.com>
Date: Mon, 17 Feb 2014 14:40:48 +0100
To: XProc Dev <xproc-dev@w3.org>
Message-ID: <CAEaz5msG23Fi02mTBzV4g70+=cmZ7DB7tvLyi_Uyok7_kH0c9g@mail.gmail.com>
Hello All,

With the dust settling on XML Prague, I've tried to make a few
observations based on feedback collected over the weekend. For some of
the more involved thoughts, I will send through separate
communications over the coming days/weeks/months.

But thought I would 'shoot from the hip' on one topic eg. the crucial
first five minutes of usage by someone investigating XProc for the
very first time;

I) People know and love pipelines and have a set of preconceptions 'in
wetware', before they come to XProc, about how pipelines should work.

II) XProc balances off many engineering choices to handle the vagaries
of managing pipelines big and small; its not trivial dealing with
pipelines that go beyond simple 'piping output from input' between
steps.

Many, many people repeated to me that XProc does poorly in the first
five minutes, in fact, it takes several sessions before basic concepts
crystallize. Many people give up at this stage but those that make it
through, turn into hard core XProc users, as they have run up and over
the learning curve.

The prospects of adoption with this 'unfriendly' first five minutes,
makes adoption beyond XML hard core less likely. That being said, if
we get the 'first five minutes' scenario right, then the broader group
of all those unix pipeline 'lovers' should be able to comprehend
things quickly and they will be happy to learn more if the return is
worth it.

I don't think we need to embark on some kind of wholesale reductionism
of basic XProc primitives, beyond what we have outlined already in
vnext spec. For example, Romain Deltour's recent email on
rationalizing inputs with options, while perceptive and well reasoned,
is a larger set of change we should probably avoid in v2 for reasons
of time/space and I think we can achieve the same effect, with less
'cuts of the scalpel'.

That being said, there are a lot of good ideas from Romain's email
that the WG will no doubt look deeply into (thx Romain for the brain
food!).

As an experiment, lets run through an evolutionary series of xproc
pipelines, loosely based on a real world examples, from users met over
the XML Prague weekend.

----------------------------------------------------------------------
Single (or Multiple) XSLT transformation pipeline
----------------------------------------------------------------------

Lets say we want to try out doing a simple XSLT transform, in XProc,
where I provide some source and define XSLT transform, and want to
save the results to disk.

I diligently brush up on all things XProc and fire up oXygenXML (or
download calabash) and come up with the following as my first stab at
a pipeline;

<?xml version="1.0" encoding="UTF-8"?>
<p:pipeline xmlns:p="http://www.w3.org/ns/xproc"
  xmlns:c="http://www.w3.org/ns/xproc-step" version="1.0">

  <p:xslt>
    <p:input port="stylesheet">
      <p:document href="rce2sp.xsl"/>
    </p:input>
  </p:xslt>

  <p:store href="data.xml"/>

</p:pipeline>

I already had to take on board a few XProcisms like basic principles
of port bindings and how documents flow through pipelines. I am unsure
of how to set data input, I see p:document and learn about p:pipeline
being a bit of syntactic sugar, so I quickly rewrite too

<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
  xmlns:c="http://www.w3.org/ns/xproc-step" version="1.0">

    <p:input port="source" sequence="false">
        <p:document href="data.xml"/>
    </p:input>
    <p:output port="result"/>

  <p:xslt>
    <p:input port="stylesheet">
      <p:document href="rce2sp.xsl"/>
    </p:input>
  </p:xslt>

  <p:store href="data.xml"/>

</p:declare-step>

When I run this script, the XProc processor complains about the XSLT
step needing parameters. So I read up again, ask the interwebs, review
the mailing lists and come up with;

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
xmlns:c="http://www.w3.org/ns/xproc-step"
    version="1.0">

    <p:input port="source" sequence="false">
        <p:document href="data.xml"/>
    </p:input>

    <p:output port="result"/>

    <p:xslt>
        <p:input port="stylesheet">
            <p:document href="test.xsl"/>
        </p:input>
        <p:input port="parameters">
            <p:empty/>
        </p:input>
    </p:xslt>

    <p:store href="data.xml"/>

</p:declare-step>

I have no desire to use parameters, so I learn about the trick of
setting them to p:empty, which is strange. I still get an error about
unbound ports, hmmmm .... back to the docs ... read some more, learn
some more ....

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
xmlns:c="http://www.w3.org/ns/xproc-step"
    version="1.0">

    <p:input port="source" sequence="false">
        <p:document href="data.xml"/>
    </p:input>

    <p:output port="result" sequence="true">
        <p:empty/>
    </p:output>

    <p:xslt>
        <p:input port="stylesheet">
            <p:document href="test.xsl"/>
        </p:input>
        <p:input port="parameters">
            <p:empty/>
        </p:input>
    </p:xslt>

    <p:store href="output.xml"/>

</p:declare-step>

I run this and have successful output, but at this stage, I don't
understand a number of concepts ... some are anachronistic like; whats
this about setting sequences on ports or why do I have to set
something to 'empty' for parameters. But some concepts run counter to
my intuition about pipelines, where I expect some kind of output by
default. By this stage, its worrying that I have to somehow care about
managing the end result port or be so explicit with my pipeline
definition.

Alternately, someone could have arrived at a different XProc script at
the start, for example;

<p:pipeline xmlns:p="http://www.w3.org/ns/xproc"
xmlns:c="http://www.w3.org/ns/xproc-step"
    version="1.0">
    <p:xslt>
        <p:input port="stylesheet">
            <p:document href="test.xsl"/>
        </p:input>
    </p:xslt>
</p:pipeline>

or this

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
xmlns:c="http://www.w3.org/ns/xproc-step"
    version="1.0">
    <p:input port="source"/>
    <p:output port="result"/>
    <p:input port="parameters" kind="parameter"/>
    <p:xslt>
        <p:input port="stylesheet">
            <p:document href="test.xsl"/>
        </p:input>
    </p:xslt>
</p:declare-step>

but for these to run without error, one would have too know how to set
commandline switches (or oXygenXML setup) so that parameters are set,
to get this running correctly.

The point of going through this evolution of xproc scripts, is to
remind us all that for newbies this process of learning typically
results in frustration, because;

I) XProc basic operation works sometimes differently then my preconceptions

II) I have to learn many concepts before I get something running

III) and/or I have to learn a few things about execution environment
(commandline options, oXygenXML setup)

All of use being life long autodidacts are not afraid of learning, but
there should be symmetry in the learning process ... all we are trying
to do is run an xslt transform and save its output.

As it stands with XProc v1, we are asking people to do a lot then what
they can do today with some other easier to comprehend tool/utility.

Stepping back, I think XProc v1 gets the hairy things right (hence the
previous caution of hacking away at it) because the WG worked through
many serious issues with much thoughtful debate underpinning design
decisions.

So, what might be a better first five minute experience for the newbie user ?

I) Thought experiment #1

<p:pipeline>
   <p:xslt stylesheet-href="test.xsl"/>
</p:pipeline>

>xproc -p mypipeline.xpl data.xml

* we could consider some kind of alt port mechanism where a p:document
href could be represented by a specially named option (uggg...)
*  a shell script, called xproc, where we put the data flowing through
the pipeline 'front and centre'
* default scenario should not require setting something to empty (like params)


II) Thought experiment #2

<p:pipeline>
   <p:xslt stylesheet-href ="test.xsl" result-href="step1out.xml"/>
   <p:xslt stylesheet-href ="test1.xsl"/>
   <p:xslt stylesheet-href ="test2.xsl" result-href ="step2out.xml"/>
   <p:xslt stylesheet-href ="test3.xsl"/>
</p:pipeline>

>xproc -p mypipeline.xpl data.xml data2.xml

* we could do some kind of syntax sugar by allowing p:document href to
be set with an option
* we let data continue flowing pipeline through as a default posture
(multiple result output bindings) which would lessen confusion caused
by using p:store
* let users easily 'dip' into the data stream and save intermediate
steps to make the process transparent and easy to debug

III) Caveats

CAVEAT #1 - I am not strongly advocating specifically doing I) or II),
this is 'shooting from the hip' type thinking and not fully baked.

CAVEAT #2 - The WG is well aware of some of the problems (like
parameters) and some parts of v2 requirements hopefully will address
those shortcomings

CAVEAT #3 - To repeat, I think XProc v1 just needs the 'final mile' to
be carefully constructed and communicated, not wholesale changes.

IV) Summary

I am trying to convey how important it is to cater for the 'first five
minute' scenario. If we get this wrong in v2, then there is no 'first
day', 'first month' or 'first year' scenario.

Any additional examples that illustrate the newbie's plight would be
most useful, as well as any additional comment.

Jim Fuller
Received on Monday, 17 February 2014 13:41:17 UTC