Re: The first five minutes ... a thought experiment (long) from Romain Deltour on 2014-02-18 (xproc-dev@w3.org from February 2014)

From: Romain Deltour <rdeltour@gmail.com>
Date: Tue, 18 Feb 2014 11:35:22 +0100
To: James Fuller <jim@webcomposite.com>, XProc Dev <xproc-dev@w3.org>
Message-Id: <81A346AA-85DF-440A-9684-71F57CE77BCF@gmail.com>
Many thanks Jim for sharing these insightful considerations.

It seems we all agree that one big issue with XProc is it’s learning curve, which is steep and rathe long tailed. The 5 minutes problem is *one* aspect of it, but as others pointed out it practically affects far more than 5 mins of an XProc developer’s life.

I think most of us will also agree that XProc v1 functionally covers 99% of what people need, at least thanks to extension hooks and custom steps.

You said:
> I don't think we need to embark on some kind of wholesale reductionism
> of basic XProc primitives, beyond what we have outlined already in
> vnext spec.

To be honest I was expecting this understandable kind of reaction to my proposal ;) The proposed change is indeed significant (hence the “v4” joke).

IMHO, one of the primary cause of XProc's steep learning curve lies in its idiosyncrasies. No matter how hard we try to simplify the syntax (as in req 2.7) or provide step primitives, until an XProc newcomer will need to fully grok concepts like options / input / params, the learning curve will stand. I’m not even talking about “optional option" vs "required option”...

I firmly believe that a simplification of the model is sorely needed **eventually**. I’m not arguing it should be in v2, that’s entirely up to the WG, considering manpower and roadmap.
Personally I’d prefer to get a radical simplification earlier rather than later, but I perfectly see how picking low hanging usability fruits in v2 is compelling.


> Stepping back, I think XProc v1 gets the hairy things right (hence the
> previous caution of hacking away at it) because the WG worked through
> many serious issues with much thoughtful debate underpinning design
> decisions.

I certainly don’t doubt that. 
However, I would change first statement to “XProc v1 got the hairy things right **given the agreed-on constrained context**”.

In v2, there are two significant changes: (a) allowing any XDM in options –and ports?– and (b) allowing non-XML documents. I assume these changes would impact how you’d figure out the hairy things nowadays.

Changes like this make some concepts peculiarities even more subtle, which might mean even more difficult to fully understand for a newcomer.


Romain.


On 17 févr. 2014, at 14:40, James Fuller <jim@webcomposite.com> wrote:

> Hello All,
> 
> With the dust settling on XML Prague, I've tried to make a few
> observations based on feedback collected over the weekend. For some of
> the more involved thoughts, I will send through separate
> communications over the coming days/weeks/months.
> 
> But thought I would 'shoot from the hip' on one topic eg. the crucial
> first five minutes of usage by someone investigating XProc for the
> very first time;
> 
> I) People know and love pipelines and have a set of preconceptions 'in
> wetware', before they come to XProc, about how pipelines should work.
> 
> II) XProc balances off many engineering choices to handle the vagaries
> of managing pipelines big and small; its not trivial dealing with
> pipelines that go beyond simple 'piping output from input' between
> steps.
> 
> Many, many people repeated to me that XProc does poorly in the first
> five minutes, in fact, it takes several sessions before basic concepts
> crystallize. Many people give up at this stage but those that make it
> through, turn into hard core XProc users, as they have run up and over
> the learning curve.
> 
> The prospects of adoption with this 'unfriendly' first five minutes,
> makes adoption beyond XML hard core less likely. That being said, if
> we get the 'first five minutes' scenario right, then the broader group
> of all those unix pipeline 'lovers' should be able to comprehend
> things quickly and they will be happy to learn more if the return is
> worth it.
> 
> I don't think we need to embark on some kind of wholesale reductionism
> of basic XProc primitives, beyond what we have outlined already in
> vnext spec. For example, Romain Deltour's recent email on
> rationalizing inputs with options, while perceptive and well reasoned,
> is a larger set of change we should probably avoid in v2 for reasons
> of time/space and I think we can achieve the same effect, with less
> 'cuts of the scalpel'.
> 
> That being said, there are a lot of good ideas from Romain's email
> that the WG will no doubt look deeply into (thx Romain for the brain
> food!).
> 
> As an experiment, lets run through an evolutionary series of xproc
> pipelines, loosely based on a real world examples, from users met over
> the XML Prague weekend.
> 
> ----------------------------------------------------------------------
> Single (or Multiple) XSLT transformation pipeline
> ----------------------------------------------------------------------
> 
> Lets say we want to try out doing a simple XSLT transform, in XProc,
> where I provide some source and define XSLT transform, and want to
> save the results to disk.
> 
> I diligently brush up on all things XProc and fire up oXygenXML (or
> download calabash) and come up with the following as my first stab at
> a pipeline;
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <p:pipeline xmlns:p="http://www.w3.org/ns/xproc"
>  xmlns:c="http://www.w3.org/ns/xproc-step" version="1.0">
> 
>  <p:xslt>
>    <p:input port="stylesheet">
>      <p:document href="rce2sp.xsl"/>
>    </p:input>
>  </p:xslt>
> 
>  <p:store href="data.xml"/>
> 
> </p:pipeline>
> 
> I already had to take on board a few XProcisms like basic principles
> of port bindings and how documents flow through pipelines. I am unsure
> of how to set data input, I see p:document and learn about p:pipeline
> being a bit of syntactic sugar, so I quickly rewrite too
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
>  xmlns:c="http://www.w3.org/ns/xproc-step" version="1.0">
> 
>    <p:input port="source" sequence="false">
>        <p:document href="data.xml"/>
>    </p:input>
>    <p:output port="result"/>
> 
>  <p:xslt>
>    <p:input port="stylesheet">
>      <p:document href="rce2sp.xsl"/>
>    </p:input>
>  </p:xslt>
> 
>  <p:store href="data.xml"/>
> 
> </p:declare-step>
> 
> When I run this script, the XProc processor complains about the XSLT
> step needing parameters. So I read up again, ask the interwebs, review
> the mailing lists and come up with;
> 
> <p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
> xmlns:c="http://www.w3.org/ns/xproc-step"
>    version="1.0">
> 
>    <p:input port="source" sequence="false">
>        <p:document href="data.xml"/>
>    </p:input>
> 
>    <p:output port="result"/>
> 
>    <p:xslt>
>        <p:input port="stylesheet">
>            <p:document href="test.xsl"/>
>        </p:input>
>        <p:input port="parameters">
>            <p:empty/>
>        </p:input>
>    </p:xslt>
> 
>    <p:store href="data.xml"/>
> 
> </p:declare-step>
> 
> I have no desire to use parameters, so I learn about the trick of
> setting them to p:empty, which is strange. I still get an error about
> unbound ports, hmmmm .... back to the docs ... read some more, learn
> some more ....
> 
> <p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
> xmlns:c="http://www.w3.org/ns/xproc-step"
>    version="1.0">
> 
>    <p:input port="source" sequence="false">
>        <p:document href="data.xml"/>
>    </p:input>
> 
>    <p:output port="result" sequence="true">
>        <p:empty/>
>    </p:output>
> 
>    <p:xslt>
>        <p:input port="stylesheet">
>            <p:document href="test.xsl"/>
>        </p:input>
>        <p:input port="parameters">
>            <p:empty/>
>        </p:input>
>    </p:xslt>
> 
>    <p:store href="output.xml"/>
> 
> </p:declare-step>
> 
> I run this and have successful output, but at this stage, I don't
> understand a number of concepts ... some are anachronistic like; whats
> this about setting sequences on ports or why do I have to set
> something to 'empty' for parameters. But some concepts run counter to
> my intuition about pipelines, where I expect some kind of output by
> default. By this stage, its worrying that I have to somehow care about
> managing the end result port or be so explicit with my pipeline
> definition.
> 
> Alternately, someone could have arrived at a different XProc script at
> the start, for example;
> 
> <p:pipeline xmlns:p="http://www.w3.org/ns/xproc"
> xmlns:c="http://www.w3.org/ns/xproc-step"
>    version="1.0">
>    <p:xslt>
>        <p:input port="stylesheet">
>            <p:document href="test.xsl"/>
>        </p:input>
>    </p:xslt>
> </p:pipeline>
> 
> or this
> 
> <p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
> xmlns:c="http://www.w3.org/ns/xproc-step"
>    version="1.0">
>    <p:input port="source"/>
>    <p:output port="result"/>
>    <p:input port="parameters" kind="parameter"/>
>    <p:xslt>
>        <p:input port="stylesheet">
>            <p:document href="test.xsl"/>
>        </p:input>
>    </p:xslt>
> </p:declare-step>
> 
> but for these to run without error, one would have too know how to set
> commandline switches (or oXygenXML setup) so that parameters are set,
> to get this running correctly.
> 
> The point of going through this evolution of xproc scripts, is to
> remind us all that for newbies this process of learning typically
> results in frustration, because;
> 
> I) XProc basic operation works sometimes differently then my preconceptions
> 
> II) I have to learn many concepts before I get something running
> 
> III) and/or I have to learn a few things about execution environment
> (commandline options, oXygenXML setup)
> 
> All of use being life long autodidacts are not afraid of learning, but
> there should be symmetry in the learning process ... all we are trying
> to do is run an xslt transform and save its output.
> 
> As it stands with XProc v1, we are asking people to do a lot then what
> they can do today with some other easier to comprehend tool/utility.
> 
> Stepping back, I think XProc v1 gets the hairy things right (hence the
> previous caution of hacking away at it) because the WG worked through
> many serious issues with much thoughtful debate underpinning design
> decisions.
> 
> So, what might be a better first five minute experience for the newbie user ?
> 
> I) Thought experiment #1
> 
> <p:pipeline>
>   <p:xslt stylesheet-href="test.xsl"/>
> </p:pipeline>
> 
>> xproc -p mypipeline.xpl data.xml
> 
> * we could consider some kind of alt port mechanism where a p:document
> href could be represented by a specially named option (uggg...)
> *  a shell script, called xproc, where we put the data flowing through
> the pipeline 'front and centre'
> * default scenario should not require setting something to empty (like params)
> 
> 
> II) Thought experiment #2
> 
> <p:pipeline>
>   <p:xslt stylesheet-href ="test.xsl" result-href="step1out.xml"/>
>   <p:xslt stylesheet-href ="test1.xsl"/>
>   <p:xslt stylesheet-href ="test2.xsl" result-href ="step2out.xml"/>
>   <p:xslt stylesheet-href ="test3.xsl"/>
> </p:pipeline>
> 
>> xproc -p mypipeline.xpl data.xml data2.xml
> 
> * we could do some kind of syntax sugar by allowing p:document href to
> be set with an option
> * we let data continue flowing pipeline through as a default posture
> (multiple result output bindings) which would lessen confusion caused
> by using p:store
> * let users easily 'dip' into the data stream and save intermediate
> steps to make the process transparent and easy to debug
> 
> III) Caveats
> 
> CAVEAT #1 - I am not strongly advocating specifically doing I) or II),
> this is 'shooting from the hip' type thinking and not fully baked.
> 
> CAVEAT #2 - The WG is well aware of some of the problems (like
> parameters) and some parts of v2 requirements hopefully will address
> those shortcomings
> 
> CAVEAT #3 - To repeat, I think XProc v1 just needs the 'final mile' to
> be carefully constructed and communicated, not wholesale changes.
> 
> IV) Summary
> 
> I am trying to convey how important it is to cater for the 'first five
> minute' scenario. If we get this wrong in v2, then there is no 'first
> day', 'first month' or 'first year' scenario.
> 
> Any additional examples that illustrate the newbie's plight would be
> most useful, as well as any additional comment.
> 
> Jim Fuller
>
Received on Tuesday, 18 February 2014 10:35:59 UTC