- From: James Fuller <jim@webcomposite.com>
- Date: Mon, 17 Feb 2014 14:40:48 +0100
- To: XProc Dev <xproc-dev@w3.org>
Hello All, With the dust settling on XML Prague, I've tried to make a few observations based on feedback collected over the weekend. For some of the more involved thoughts, I will send through separate communications over the coming days/weeks/months. But thought I would 'shoot from the hip' on one topic eg. the crucial first five minutes of usage by someone investigating XProc for the very first time; I) People know and love pipelines and have a set of preconceptions 'in wetware', before they come to XProc, about how pipelines should work. II) XProc balances off many engineering choices to handle the vagaries of managing pipelines big and small; its not trivial dealing with pipelines that go beyond simple 'piping output from input' between steps. Many, many people repeated to me that XProc does poorly in the first five minutes, in fact, it takes several sessions before basic concepts crystallize. Many people give up at this stage but those that make it through, turn into hard core XProc users, as they have run up and over the learning curve. The prospects of adoption with this 'unfriendly' first five minutes, makes adoption beyond XML hard core less likely. That being said, if we get the 'first five minutes' scenario right, then the broader group of all those unix pipeline 'lovers' should be able to comprehend things quickly and they will be happy to learn more if the return is worth it. I don't think we need to embark on some kind of wholesale reductionism of basic XProc primitives, beyond what we have outlined already in vnext spec. For example, Romain Deltour's recent email on rationalizing inputs with options, while perceptive and well reasoned, is a larger set of change we should probably avoid in v2 for reasons of time/space and I think we can achieve the same effect, with less 'cuts of the scalpel'. That being said, there are a lot of good ideas from Romain's email that the WG will no doubt look deeply into (thx Romain for the brain food!). As an experiment, lets run through an evolutionary series of xproc pipelines, loosely based on a real world examples, from users met over the XML Prague weekend. ---------------------------------------------------------------------- Single (or Multiple) XSLT transformation pipeline ---------------------------------------------------------------------- Lets say we want to try out doing a simple XSLT transform, in XProc, where I provide some source and define XSLT transform, and want to save the results to disk. I diligently brush up on all things XProc and fire up oXygenXML (or download calabash) and come up with the following as my first stab at a pipeline; <?xml version="1.0" encoding="UTF-8"?> <p:pipeline xmlns:p="http://www.w3.org/ns/xproc" xmlns:c="http://www.w3.org/ns/xproc-step" version="1.0"> <p:xslt> <p:input port="stylesheet"> <p:document href="rce2sp.xsl"/> </p:input> </p:xslt> <p:store href="data.xml"/> </p:pipeline> I already had to take on board a few XProcisms like basic principles of port bindings and how documents flow through pipelines. I am unsure of how to set data input, I see p:document and learn about p:pipeline being a bit of syntactic sugar, so I quickly rewrite too <?xml version="1.0" encoding="UTF-8"?> <p:declare-step xmlns:p="http://www.w3.org/ns/xproc" xmlns:c="http://www.w3.org/ns/xproc-step" version="1.0"> <p:input port="source" sequence="false"> <p:document href="data.xml"/> </p:input> <p:output port="result"/> <p:xslt> <p:input port="stylesheet"> <p:document href="rce2sp.xsl"/> </p:input> </p:xslt> <p:store href="data.xml"/> </p:declare-step> When I run this script, the XProc processor complains about the XSLT step needing parameters. So I read up again, ask the interwebs, review the mailing lists and come up with; <p:declare-step xmlns:p="http://www.w3.org/ns/xproc" xmlns:c="http://www.w3.org/ns/xproc-step" version="1.0"> <p:input port="source" sequence="false"> <p:document href="data.xml"/> </p:input> <p:output port="result"/> <p:xslt> <p:input port="stylesheet"> <p:document href="test.xsl"/> </p:input> <p:input port="parameters"> <p:empty/> </p:input> </p:xslt> <p:store href="data.xml"/> </p:declare-step> I have no desire to use parameters, so I learn about the trick of setting them to p:empty, which is strange. I still get an error about unbound ports, hmmmm .... back to the docs ... read some more, learn some more .... <p:declare-step xmlns:p="http://www.w3.org/ns/xproc" xmlns:c="http://www.w3.org/ns/xproc-step" version="1.0"> <p:input port="source" sequence="false"> <p:document href="data.xml"/> </p:input> <p:output port="result" sequence="true"> <p:empty/> </p:output> <p:xslt> <p:input port="stylesheet"> <p:document href="test.xsl"/> </p:input> <p:input port="parameters"> <p:empty/> </p:input> </p:xslt> <p:store href="output.xml"/> </p:declare-step> I run this and have successful output, but at this stage, I don't understand a number of concepts ... some are anachronistic like; whats this about setting sequences on ports or why do I have to set something to 'empty' for parameters. But some concepts run counter to my intuition about pipelines, where I expect some kind of output by default. By this stage, its worrying that I have to somehow care about managing the end result port or be so explicit with my pipeline definition. Alternately, someone could have arrived at a different XProc script at the start, for example; <p:pipeline xmlns:p="http://www.w3.org/ns/xproc" xmlns:c="http://www.w3.org/ns/xproc-step" version="1.0"> <p:xslt> <p:input port="stylesheet"> <p:document href="test.xsl"/> </p:input> </p:xslt> </p:pipeline> or this <p:declare-step xmlns:p="http://www.w3.org/ns/xproc" xmlns:c="http://www.w3.org/ns/xproc-step" version="1.0"> <p:input port="source"/> <p:output port="result"/> <p:input port="parameters" kind="parameter"/> <p:xslt> <p:input port="stylesheet"> <p:document href="test.xsl"/> </p:input> </p:xslt> </p:declare-step> but for these to run without error, one would have too know how to set commandline switches (or oXygenXML setup) so that parameters are set, to get this running correctly. The point of going through this evolution of xproc scripts, is to remind us all that for newbies this process of learning typically results in frustration, because; I) XProc basic operation works sometimes differently then my preconceptions II) I have to learn many concepts before I get something running III) and/or I have to learn a few things about execution environment (commandline options, oXygenXML setup) All of use being life long autodidacts are not afraid of learning, but there should be symmetry in the learning process ... all we are trying to do is run an xslt transform and save its output. As it stands with XProc v1, we are asking people to do a lot then what they can do today with some other easier to comprehend tool/utility. Stepping back, I think XProc v1 gets the hairy things right (hence the previous caution of hacking away at it) because the WG worked through many serious issues with much thoughtful debate underpinning design decisions. So, what might be a better first five minute experience for the newbie user ? I) Thought experiment #1 <p:pipeline> <p:xslt stylesheet-href="test.xsl"/> </p:pipeline> >xproc -p mypipeline.xpl data.xml * we could consider some kind of alt port mechanism where a p:document href could be represented by a specially named option (uggg...) * a shell script, called xproc, where we put the data flowing through the pipeline 'front and centre' * default scenario should not require setting something to empty (like params) II) Thought experiment #2 <p:pipeline> <p:xslt stylesheet-href ="test.xsl" result-href="step1out.xml"/> <p:xslt stylesheet-href ="test1.xsl"/> <p:xslt stylesheet-href ="test2.xsl" result-href ="step2out.xml"/> <p:xslt stylesheet-href ="test3.xsl"/> </p:pipeline> >xproc -p mypipeline.xpl data.xml data2.xml * we could do some kind of syntax sugar by allowing p:document href to be set with an option * we let data continue flowing pipeline through as a default posture (multiple result output bindings) which would lessen confusion caused by using p:store * let users easily 'dip' into the data stream and save intermediate steps to make the process transparent and easy to debug III) Caveats CAVEAT #1 - I am not strongly advocating specifically doing I) or II), this is 'shooting from the hip' type thinking and not fully baked. CAVEAT #2 - The WG is well aware of some of the problems (like parameters) and some parts of v2 requirements hopefully will address those shortcomings CAVEAT #3 - To repeat, I think XProc v1 just needs the 'final mile' to be carefully constructed and communicated, not wholesale changes. IV) Summary I am trying to convey how important it is to cater for the 'first five minute' scenario. If we get this wrong in v2, then there is no 'first day', 'first month' or 'first year' scenario. Any additional examples that illustrate the newbie's plight would be most useful, as well as any additional comment. Jim Fuller
Received on Monday, 17 February 2014 13:41:17 UTC