Re: The first five minutes ... a thought experiment (long) from Romain Deltour on 2014-02-19 (xproc-dev@w3.org from February 2014)

From: Romain Deltour <rdeltour@gmail.com>
Date: Wed, 19 Feb 2014 10:05:24 +0100
To: "Geert J." <geert.josten@dayon.nl>
Cc: James Fuller <jim@webcomposite.com>, XProc Dev <xproc-dev@w3.org>
Message-Id: <ED1F7BD1-21A4-45B9-9EE7-11F5830FF214@gmail.com>
Shameless plug: if you could set ports from XPath expressions (see the “input == ports” thread) –and with some implicit bindings–, this would be:

<p:pipeline>
	<p:load source="doc(myinput.xml)"/>
	<p:xslt stylesheet="doc(mytransform.xsl)"/>
	<p:store href="'myoutput.xml'"/>
</p:pipeline>

which is arguably not as minimalist and consistent than Geert’s pseudo code, but close enough :)

Romain.


On 19 févr. 2014, at 08:49, Geert J. <geert.josten@dayon.nl> wrote:

> A lot has been said, still need to read up most unfortunately. But just a
> short reply on the example. My first stab at XProc (if I hadn't taken the
> unorthodox approach that I did, building my ebook proc) would have been:
> 
> <p:pipeline>
> 	<p:load href="myinput.xml"/>
> 	<p:xslt href="mytransform.xsl"/>
> 	<p:store href="myoutput.xml"/>
> </p:pipeline>
> 
> Which resembles Cocoon sitemap approach a lot. And anyone used to Cocoon
> sitemaps knows how easy it is to tie Cocoon pipes to each other, where in
> XProc it involves lots of verbose syntax to point to specific step ports
> that need to be in scope as well..
> 
> XProc doesn't do that bad though. The only thing lacking here is the href
> on p:xslt. If that were present, you could easily chain lots of xslt's as
> well, by simply repeating the p:xslt:
> 
> <p:pipeline>
> 	<p:load href="myinput.xml"/>
> 	<p:xslt href="mytransform.xsl"/>
> 	<p:xslt href="mytransform2.xsl"/>
> 	<p:xslt href="mytransform3.xsl"/>
> 	<p:store href="myoutput.xml"/>
> </p:pipeline>
> 
> Maybe we should not only focus on the bad parts of XProc, but also on the
> good parts..
> 
> Cheers,
> Geert
> 
>> -----Oorspronkelijk bericht-----
>> Van: James Fuller [mailto:jim@webcomposite.com]
>> Verzonden: maandag 17 februari 2014 14:41
>> Aan: XProc Dev
>> Onderwerp: The first five minutes ... a thought experiment (long)
>> 
>> Hello All,
>> 
>> With the dust settling on XML Prague, I've tried to make a few
>> observations based on feedback collected over the weekend. For some of
>> the more involved thoughts, I will send through separate
>> communications over the coming days/weeks/months.
>> 
>> But thought I would 'shoot from the hip' on one topic eg. the crucial
>> first five minutes of usage by someone investigating XProc for the
>> very first time;
>> 
>> I) People know and love pipelines and have a set of preconceptions 'in
>> wetware', before they come to XProc, about how pipelines should work.
>> 
>> II) XProc balances off many engineering choices to handle the vagaries
>> of managing pipelines big and small; its not trivial dealing with
>> pipelines that go beyond simple 'piping output from input' between
>> steps.
>> 
>> Many, many people repeated to me that XProc does poorly in the first
>> five minutes, in fact, it takes several sessions before basic concepts
>> crystallize. Many people give up at this stage but those that make it
>> through, turn into hard core XProc users, as they have run up and over
>> the learning curve.
>> 
>> The prospects of adoption with this 'unfriendly' first five minutes,
>> makes adoption beyond XML hard core less likely. That being said, if
>> we get the 'first five minutes' scenario right, then the broader group
>> of all those unix pipeline 'lovers' should be able to comprehend
>> things quickly and they will be happy to learn more if the return is
>> worth it.
>> 
>> I don't think we need to embark on some kind of wholesale reductionism
>> of basic XProc primitives, beyond what we have outlined already in
>> vnext spec. For example, Romain Deltour's recent email on
>> rationalizing inputs with options, while perceptive and well reasoned,
>> is a larger set of change we should probably avoid in v2 for reasons
>> of time/space and I think we can achieve the same effect, with less
>> 'cuts of the scalpel'.
>> 
>> That being said, there are a lot of good ideas from Romain's email
>> that the WG will no doubt look deeply into (thx Romain for the brain
>> food!).
>> 
>> As an experiment, lets run through an evolutionary series of xproc
>> pipelines, loosely based on a real world examples, from users met over
>> the XML Prague weekend.
>> 
>> ----------------------------------------------------------------------
>> Single (or Multiple) XSLT transformation pipeline
>> ----------------------------------------------------------------------
>> 
>> Lets say we want to try out doing a simple XSLT transform, in XProc,
>> where I provide some source and define XSLT transform, and want to
>> save the results to disk.
>> 
>> I diligently brush up on all things XProc and fire up oXygenXML (or
>> download calabash) and come up with the following as my first stab at
>> a pipeline;
>> 
>> <?xml version="1.0" encoding="UTF-8"?>
>> <p:pipeline xmlns:p="http://www.w3.org/ns/xproc"
>>  xmlns:c="http://www.w3.org/ns/xproc-step" version="1.0">
>> 
>>  <p:xslt>
>>    <p:input port="stylesheet">
>>      <p:document href="rce2sp.xsl"/>
>>    </p:input>
>>  </p:xslt>
>> 
>>  <p:store href="data.xml"/>
>> 
>> </p:pipeline>
>> 
>> I already had to take on board a few XProcisms like basic principles
>> of port bindings and how documents flow through pipelines. I am unsure
>> of how to set data input, I see p:document and learn about p:pipeline
>> being a bit of syntactic sugar, so I quickly rewrite too
>> 
>> <?xml version="1.0" encoding="UTF-8"?>
>> <p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
>>  xmlns:c="http://www.w3.org/ns/xproc-step" version="1.0">
>> 
>>    <p:input port="source" sequence="false">
>>        <p:document href="data.xml"/>
>>    </p:input>
>>    <p:output port="result"/>
>> 
>>  <p:xslt>
>>    <p:input port="stylesheet">
>>      <p:document href="rce2sp.xsl"/>
>>    </p:input>
>>  </p:xslt>
>> 
>>  <p:store href="data.xml"/>
>> 
>> </p:declare-step>
>> 
>> When I run this script, the XProc processor complains about the XSLT
>> step needing parameters. So I read up again, ask the interwebs, review
>> the mailing lists and come up with;
>> 
>> <p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
>> xmlns:c="http://www.w3.org/ns/xproc-step"
>>    version="1.0">
>> 
>>    <p:input port="source" sequence="false">
>>        <p:document href="data.xml"/>
>>    </p:input>
>> 
>>    <p:output port="result"/>
>> 
>>    <p:xslt>
>>        <p:input port="stylesheet">
>>            <p:document href="test.xsl"/>
>>        </p:input>
>>        <p:input port="parameters">
>>            <p:empty/>
>>        </p:input>
>>    </p:xslt>
>> 
>>    <p:store href="data.xml"/>
>> 
>> </p:declare-step>
>> 
>> I have no desire to use parameters, so I learn about the trick of
>> setting them to p:empty, which is strange. I still get an error about
>> unbound ports, hmmmm .... back to the docs ... read some more, learn
>> some more ....
>> 
>> <p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
>> xmlns:c="http://www.w3.org/ns/xproc-step"
>>    version="1.0">
>> 
>>    <p:input port="source" sequence="false">
>>        <p:document href="data.xml"/>
>>    </p:input>
>> 
>>    <p:output port="result" sequence="true">
>>        <p:empty/>
>>    </p:output>
>> 
>>    <p:xslt>
>>        <p:input port="stylesheet">
>>            <p:document href="test.xsl"/>
>>        </p:input>
>>        <p:input port="parameters">
>>            <p:empty/>
>>        </p:input>
>>    </p:xslt>
>> 
>>    <p:store href="output.xml"/>
>> 
>> </p:declare-step>
>> 
>> I run this and have successful output, but at this stage, I don't
>> understand a number of concepts ... some are anachronistic like; whats
>> this about setting sequences on ports or why do I have to set
>> something to 'empty' for parameters. But some concepts run counter to
>> my intuition about pipelines, where I expect some kind of output by
>> default. By this stage, its worrying that I have to somehow care about
>> managing the end result port or be so explicit with my pipeline
>> definition.
>> 
>> Alternately, someone could have arrived at a different XProc script at
>> the start, for example;
>> 
>> <p:pipeline xmlns:p="http://www.w3.org/ns/xproc"
>> xmlns:c="http://www.w3.org/ns/xproc-step"
>>    version="1.0">
>>    <p:xslt>
>>        <p:input port="stylesheet">
>>            <p:document href="test.xsl"/>
>>        </p:input>
>>    </p:xslt>
>> </p:pipeline>
>> 
>> or this
>> 
>> <p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
>> xmlns:c="http://www.w3.org/ns/xproc-step"
>>    version="1.0">
>>    <p:input port="source"/>
>>    <p:output port="result"/>
>>    <p:input port="parameters" kind="parameter"/>
>>    <p:xslt>
>>        <p:input port="stylesheet">
>>            <p:document href="test.xsl"/>
>>        </p:input>
>>    </p:xslt>
>> </p:declare-step>
>> 
>> but for these to run without error, one would have too know how to set
>> commandline switches (or oXygenXML setup) so that parameters are set,
>> to get this running correctly.
>> 
>> The point of going through this evolution of xproc scripts, is to
>> remind us all that for newbies this process of learning typically
>> results in frustration, because;
>> 
>> I) XProc basic operation works sometimes differently then my
>> preconceptions
>> 
>> II) I have to learn many concepts before I get something running
>> 
>> III) and/or I have to learn a few things about execution environment
>> (commandline options, oXygenXML setup)
>> 
>> All of use being life long autodidacts are not afraid of learning, but
>> there should be symmetry in the learning process ... all we are trying
>> to do is run an xslt transform and save its output.
>> 
>> As it stands with XProc v1, we are asking people to do a lot then what
>> they can do today with some other easier to comprehend tool/utility.
>> 
>> Stepping back, I think XProc v1 gets the hairy things right (hence the
>> previous caution of hacking away at it) because the WG worked through
>> many serious issues with much thoughtful debate underpinning design
>> decisions.
>> 
>> So, what might be a better first five minute experience for the newbie
> user ?
>> 
>> I) Thought experiment #1
>> 
>> <p:pipeline>
>>   <p:xslt stylesheet-href="test.xsl"/>
>> </p:pipeline>
>> 
>>> xproc -p mypipeline.xpl data.xml
>> 
>> * we could consider some kind of alt port mechanism where a p:document
>> href could be represented by a specially named option (uggg...)
>> *  a shell script, called xproc, where we put the data flowing through
>> the pipeline 'front and centre'
>> * default scenario should not require setting something to empty (like
>> params)
>> 
>> 
>> II) Thought experiment #2
>> 
>> <p:pipeline>
>>   <p:xslt stylesheet-href ="test.xsl" result-href="step1out.xml"/>
>>   <p:xslt stylesheet-href ="test1.xsl"/>
>>   <p:xslt stylesheet-href ="test2.xsl" result-href ="step2out.xml"/>
>>   <p:xslt stylesheet-href ="test3.xsl"/>
>> </p:pipeline>
>> 
>>> xproc -p mypipeline.xpl data.xml data2.xml
>> 
>> * we could do some kind of syntax sugar by allowing p:document href to
>> be set with an option
>> * we let data continue flowing pipeline through as a default posture
>> (multiple result output bindings) which would lessen confusion caused
>> by using p:store
>> * let users easily 'dip' into the data stream and save intermediate
>> steps to make the process transparent and easy to debug
>> 
>> III) Caveats
>> 
>> CAVEAT #1 - I am not strongly advocating specifically doing I) or II),
>> this is 'shooting from the hip' type thinking and not fully baked.
>> 
>> CAVEAT #2 - The WG is well aware of some of the problems (like
>> parameters) and some parts of v2 requirements hopefully will address
>> those shortcomings
>> 
>> CAVEAT #3 - To repeat, I think XProc v1 just needs the 'final mile' to
>> be carefully constructed and communicated, not wholesale changes.
>> 
>> IV) Summary
>> 
>> I am trying to convey how important it is to cater for the 'first five
>> minute' scenario. If we get this wrong in v2, then there is no 'first
>> day', 'first month' or 'first year' scenario.
>> 
>> Any additional examples that illustrate the newbie's plight would be
>> most useful, as well as any additional comment.
>> 
>> Jim Fuller
>
Received on Wednesday, 19 February 2014 09:05:58 UTC