Re: The first five minutes ... a thought experiment (long) from Alex Milowski on 2014-02-20 (xproc-dev@w3.org from February 2014)

From: Alex Milowski <alex@milowski.com>
Date: Thu, 20 Feb 2014 12:07:48 +0000
To: XProc Dev <xproc-dev@w3.org>
Message-ID: <CABp3FNKQ-rVZJ9pK6Fo39228uRE+xECuSqt_iZKdOOKgCBhtHQ@mail.gmail.com>
Using XPath for ports is fraught with issues.  How do you do the
static analysis to figure out what is connected to what?

I could on about that and would like to when I have a bit more time.

Meanwhile, just solving the syntactic shortcuts means you can just do
something like this:

<p:pipeline>
        <p:xslt stylesheet="mytransform.xsl"/>
</p:pipeline>

On Wed, Feb 19, 2014 at 9:05 AM, Romain Deltour <rdeltour@gmail.com> wrote:
> Shameless plug: if you could set ports from XPath expressions (see the "input == ports" thread) -and with some implicit bindings-, this would be:
>
> <p:pipeline>
>         <p:load source="doc(myinput.xml)"/>
>         <p:xslt stylesheet="doc(mytransform.xsl)"/>
>         <p:store href="'myoutput.xml'"/>
> </p:pipeline>
>
> which is arguably not as minimalist and consistent than Geert's pseudo code, but close enough :)
>
> Romain.
>
>
> On 19 févr. 2014, at 08:49, Geert J. <geert.josten@dayon.nl> wrote:
>
>> A lot has been said, still need to read up most unfortunately. But just a
>> short reply on the example. My first stab at XProc (if I hadn't taken the
>> unorthodox approach that I did, building my ebook proc) would have been:
>>
>> <p:pipeline>
>>       <p:load href="myinput.xml"/>
>>       <p:xslt href="mytransform.xsl"/>
>>       <p:store href="myoutput.xml"/>
>> </p:pipeline>
>>
>> Which resembles Cocoon sitemap approach a lot. And anyone used to Cocoon
>> sitemaps knows how easy it is to tie Cocoon pipes to each other, where in
>> XProc it involves lots of verbose syntax to point to specific step ports
>> that need to be in scope as well..
>>
>> XProc doesn't do that bad though. The only thing lacking here is the href
>> on p:xslt. If that were present, you could easily chain lots of xslt's as
>> well, by simply repeating the p:xslt:
>>
>> <p:pipeline>
>>       <p:load href="myinput.xml"/>
>>       <p:xslt href="mytransform.xsl"/>
>>       <p:xslt href="mytransform2.xsl"/>
>>       <p:xslt href="mytransform3.xsl"/>
>>       <p:store href="myoutput.xml"/>
>> </p:pipeline>
>>
>> Maybe we should not only focus on the bad parts of XProc, but also on the
>> good parts..
>>
>> Cheers,
>> Geert
>>
>>> -----Oorspronkelijk bericht-----
>>> Van: James Fuller [mailto:jim@webcomposite.com]
>>> Verzonden: maandag 17 februari 2014 14:41
>>> Aan: XProc Dev
>>> Onderwerp: The first five minutes ... a thought experiment (long)
>>>
>>> Hello All,
>>>
>>> With the dust settling on XML Prague, I've tried to make a few
>>> observations based on feedback collected over the weekend. For some of
>>> the more involved thoughts, I will send through separate
>>> communications over the coming days/weeks/months.
>>>
>>> But thought I would 'shoot from the hip' on one topic eg. the crucial
>>> first five minutes of usage by someone investigating XProc for the
>>> very first time;
>>>
>>> I) People know and love pipelines and have a set of preconceptions 'in
>>> wetware', before they come to XProc, about how pipelines should work.
>>>
>>> II) XProc balances off many engineering choices to handle the vagaries
>>> of managing pipelines big and small; its not trivial dealing with
>>> pipelines that go beyond simple 'piping output from input' between
>>> steps.
>>>
>>> Many, many people repeated to me that XProc does poorly in the first
>>> five minutes, in fact, it takes several sessions before basic concepts
>>> crystallize. Many people give up at this stage but those that make it
>>> through, turn into hard core XProc users, as they have run up and over
>>> the learning curve.
>>>
>>> The prospects of adoption with this 'unfriendly' first five minutes,
>>> makes adoption beyond XML hard core less likely. That being said, if
>>> we get the 'first five minutes' scenario right, then the broader group
>>> of all those unix pipeline 'lovers' should be able to comprehend
>>> things quickly and they will be happy to learn more if the return is
>>> worth it.
>>>
>>> I don't think we need to embark on some kind of wholesale reductionism
>>> of basic XProc primitives, beyond what we have outlined already in
>>> vnext spec. For example, Romain Deltour's recent email on
>>> rationalizing inputs with options, while perceptive and well reasoned,
>>> is a larger set of change we should probably avoid in v2 for reasons
>>> of time/space and I think we can achieve the same effect, with less
>>> 'cuts of the scalpel'.
>>>
>>> That being said, there are a lot of good ideas from Romain's email
>>> that the WG will no doubt look deeply into (thx Romain for the brain
>>> food!).
>>>
>>> As an experiment, lets run through an evolutionary series of xproc
>>> pipelines, loosely based on a real world examples, from users met over
>>> the XML Prague weekend.
>>>
>>> ----------------------------------------------------------------------
>>> Single (or Multiple) XSLT transformation pipeline
>>> ----------------------------------------------------------------------
>>>
>>> Lets say we want to try out doing a simple XSLT transform, in XProc,
>>> where I provide some source and define XSLT transform, and want to
>>> save the results to disk.
>>>
>>> I diligently brush up on all things XProc and fire up oXygenXML (or
>>> download calabash) and come up with the following as my first stab at
>>> a pipeline;
>>>
>>> <?xml version="1.0" encoding="UTF-8"?>
>>> <p:pipeline xmlns:p="http://www.w3.org/ns/xproc"
>>>  xmlns:c="http://www.w3.org/ns/xproc-step" version="1.0">
>>>
>>>  <p:xslt>
>>>    <p:input port="stylesheet">
>>>      <p:document href="rce2sp.xsl"/>
>>>    </p:input>
>>>  </p:xslt>
>>>
>>>  <p:store href="data.xml"/>
>>>
>>> </p:pipeline>
>>>
>>> I already had to take on board a few XProcisms like basic principles
>>> of port bindings and how documents flow through pipelines. I am unsure
>>> of how to set data input, I see p:document and learn about p:pipeline
>>> being a bit of syntactic sugar, so I quickly rewrite too
>>>
>>> <?xml version="1.0" encoding="UTF-8"?>
>>> <p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
>>>  xmlns:c="http://www.w3.org/ns/xproc-step" version="1.0">
>>>
>>>    <p:input port="source" sequence="false">
>>>        <p:document href="data.xml"/>
>>>    </p:input>
>>>    <p:output port="result"/>
>>>
>>>  <p:xslt>
>>>    <p:input port="stylesheet">
>>>      <p:document href="rce2sp.xsl"/>
>>>    </p:input>
>>>  </p:xslt>
>>>
>>>  <p:store href="data.xml"/>
>>>
>>> </p:declare-step>
>>>
>>> When I run this script, the XProc processor complains about the XSLT
>>> step needing parameters. So I read up again, ask the interwebs, review
>>> the mailing lists and come up with;
>>>
>>> <p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
>>> xmlns:c="http://www.w3.org/ns/xproc-step"
>>>    version="1.0">
>>>
>>>    <p:input port="source" sequence="false">
>>>        <p:document href="data.xml"/>
>>>    </p:input>
>>>
>>>    <p:output port="result"/>
>>>
>>>    <p:xslt>
>>>        <p:input port="stylesheet">
>>>            <p:document href="test.xsl"/>
>>>        </p:input>
>>>        <p:input port="parameters">
>>>            <p:empty/>
>>>        </p:input>
>>>    </p:xslt>
>>>
>>>    <p:store href="data.xml"/>
>>>
>>> </p:declare-step>
>>>
>>> I have no desire to use parameters, so I learn about the trick of
>>> setting them to p:empty, which is strange. I still get an error about
>>> unbound ports, hmmmm .... back to the docs ... read some more, learn
>>> some more ....
>>>
>>> <p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
>>> xmlns:c="http://www.w3.org/ns/xproc-step"
>>>    version="1.0">
>>>
>>>    <p:input port="source" sequence="false">
>>>        <p:document href="data.xml"/>
>>>    </p:input>
>>>
>>>    <p:output port="result" sequence="true">
>>>        <p:empty/>
>>>    </p:output>
>>>
>>>    <p:xslt>
>>>        <p:input port="stylesheet">
>>>            <p:document href="test.xsl"/>
>>>        </p:input>
>>>        <p:input port="parameters">
>>>            <p:empty/>
>>>        </p:input>
>>>    </p:xslt>
>>>
>>>    <p:store href="output.xml"/>
>>>
>>> </p:declare-step>
>>>
>>> I run this and have successful output, but at this stage, I don't
>>> understand a number of concepts ... some are anachronistic like; whats
>>> this about setting sequences on ports or why do I have to set
>>> something to 'empty' for parameters. But some concepts run counter to
>>> my intuition about pipelines, where I expect some kind of output by
>>> default. By this stage, its worrying that I have to somehow care about
>>> managing the end result port or be so explicit with my pipeline
>>> definition.
>>>
>>> Alternately, someone could have arrived at a different XProc script at
>>> the start, for example;
>>>
>>> <p:pipeline xmlns:p="http://www.w3.org/ns/xproc"
>>> xmlns:c="http://www.w3.org/ns/xproc-step"
>>>    version="1.0">
>>>    <p:xslt>
>>>        <p:input port="stylesheet">
>>>            <p:document href="test.xsl"/>
>>>        </p:input>
>>>    </p:xslt>
>>> </p:pipeline>
>>>
>>> or this
>>>
>>> <p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
>>> xmlns:c="http://www.w3.org/ns/xproc-step"
>>>    version="1.0">
>>>    <p:input port="source"/>
>>>    <p:output port="result"/>
>>>    <p:input port="parameters" kind="parameter"/>
>>>    <p:xslt>
>>>        <p:input port="stylesheet">
>>>            <p:document href="test.xsl"/>
>>>        </p:input>
>>>    </p:xslt>
>>> </p:declare-step>
>>>
>>> but for these to run without error, one would have too know how to set
>>> commandline switches (or oXygenXML setup) so that parameters are set,
>>> to get this running correctly.
>>>
>>> The point of going through this evolution of xproc scripts, is to
>>> remind us all that for newbies this process of learning typically
>>> results in frustration, because;
>>>
>>> I) XProc basic operation works sometimes differently then my
>>> preconceptions
>>>
>>> II) I have to learn many concepts before I get something running
>>>
>>> III) and/or I have to learn a few things about execution environment
>>> (commandline options, oXygenXML setup)
>>>
>>> All of use being life long autodidacts are not afraid of learning, but
>>> there should be symmetry in the learning process ... all we are trying
>>> to do is run an xslt transform and save its output.
>>>
>>> As it stands with XProc v1, we are asking people to do a lot then what
>>> they can do today with some other easier to comprehend tool/utility.
>>>
>>> Stepping back, I think XProc v1 gets the hairy things right (hence the
>>> previous caution of hacking away at it) because the WG worked through
>>> many serious issues with much thoughtful debate underpinning design
>>> decisions.
>>>
>>> So, what might be a better first five minute experience for the newbie
>> user ?
>>>
>>> I) Thought experiment #1
>>>
>>> <p:pipeline>
>>>   <p:xslt stylesheet-href="test.xsl"/>
>>> </p:pipeline>
>>>
>>>> xproc -p mypipeline.xpl data.xml
>>>
>>> * we could consider some kind of alt port mechanism where a p:document
>>> href could be represented by a specially named option (uggg...)
>>> *  a shell script, called xproc, where we put the data flowing through
>>> the pipeline 'front and centre'
>>> * default scenario should not require setting something to empty (like
>>> params)
>>>
>>>
>>> II) Thought experiment #2
>>>
>>> <p:pipeline>
>>>   <p:xslt stylesheet-href ="test.xsl" result-href="step1out.xml"/>
>>>   <p:xslt stylesheet-href ="test1.xsl"/>
>>>   <p:xslt stylesheet-href ="test2.xsl" result-href ="step2out.xml"/>
>>>   <p:xslt stylesheet-href ="test3.xsl"/>
>>> </p:pipeline>
>>>
>>>> xproc -p mypipeline.xpl data.xml data2.xml
>>>
>>> * we could do some kind of syntax sugar by allowing p:document href to
>>> be set with an option
>>> * we let data continue flowing pipeline through as a default posture
>>> (multiple result output bindings) which would lessen confusion caused
>>> by using p:store
>>> * let users easily 'dip' into the data stream and save intermediate
>>> steps to make the process transparent and easy to debug
>>>
>>> III) Caveats
>>>
>>> CAVEAT #1 - I am not strongly advocating specifically doing I) or II),
>>> this is 'shooting from the hip' type thinking and not fully baked.
>>>
>>> CAVEAT #2 - The WG is well aware of some of the problems (like
>>> parameters) and some parts of v2 requirements hopefully will address
>>> those shortcomings
>>>
>>> CAVEAT #3 - To repeat, I think XProc v1 just needs the 'final mile' to
>>> be carefully constructed and communicated, not wholesale changes.
>>>
>>> IV) Summary
>>>
>>> I am trying to convey how important it is to cater for the 'first five
>>> minute' scenario. If we get this wrong in v2, then there is no 'first
>>> day', 'first month' or 'first year' scenario.
>>>
>>> Any additional examples that illustrate the newbie's plight would be
>>> most useful, as well as any additional comment.
>>>
>>> Jim Fuller
>>
>
>



-- 
--Alex Milowski
"The excellence of grammar as a guide is proportional to the paucity of the
inflexions, i.e. to the degree of analysis effected by the language
considered."

Bertrand Russell in a footnote of Principles of Mathematics
Received on Thursday, 20 February 2014 12:08:16 UTC