Re: The first five minutes ... a thought experiment (long)

(Accidentally sent this reply to James instead of list--sorry James!)

On 2/17/2014 5:40 AM, James Fuller wrote:

> The point of going through this evolution of xproc scripts, is to
> remind us all that for newbies this process of learning typically
> results in frustration, because;
> I) XProc basic operation works sometimes differently then my preconceptions
> II) I have to learn many concepts before I get something running
> III) and/or I have to learn a few things about execution environment
> (commandline options, oXygenXML setup)
> All of use being life long autodidacts are not afraid of learning, but
> there should be symmetry in the learning process ... all we are trying
> to do is run an xslt transform and save its output.
> As it stands with XProc v1, we are asking people to do a lot then what
> they can do today with some other easier to comprehend tool/utility.


As someone that just recently started using XProc (and learning XML 
Schema 1.1 and XSLT/XPath 2.0 at the same time), I have not had a 
particularly hard time figuring it out.


There are some things that I have found bizarre.  In particular, why are 
"documents" flowing through pipes rather that just (e.g.) XPath data 
values (of which documents are just one form).  Related to that, 
documents and parameters (for e.g. XSLT) are all more or less XPath data 
values, and it would be nice if there was a uniform way that such data 
could be passed around.  But that hasn't been difficult to figure out, 
just noted.


The thing that has frustrated me in particular is the way that 
versioning is handled throughout many of the XML-related standards. 
XProc does some things right: ability to specify required XPath version 
and required XSLT version.  But no version requirement for XML Schema? 
No schema-aware XSLT requirements?  What about branching off of these 
values?  What I ended up doing (since I needed a pipeline that could be 
executed with and without schema support) was create an option for this 
AND pass around Saxon configuration files.  E.g. this is my invocation:

java com.xmlcalabash.drivers.Main --saxon-configuration=saxon-ee.xml 
pipeline.xpl schema-aware=yes


java com.xmlcalabash.drivers.Main --saxon-configuration=saxon-he.xml 
pipeline.xpl schema-aware=no

The only thing the the Saxon configuration files are doing is specifying 
XML Schema version and turning schema-aware XSLT on and off.

AFAIK, this stuff is unspecifiable in the pipeline itself (at least 
without extensions).  I would like to have:

java com.xmlcalabash.drivers.Main pipeline.xpl


So far I have found the default piping stuff more bewildering than 
helpful--mostly because it hides what is actually happening.  I have 
just started using XProc.  The reason for doing so is because I have 
processing tasks that aren't just a simple linear pipelines.  If I just 
wanted to take some XML data through XSLT, I wouldn't bother to set up a 
pipeline.  So, at this point, I just explicitly specify all connections 
between ports.


I haven't run across it yet, but I am worried about the lack of the 
ability to cache intermediate results in a direct way.  Viewing a 
pipeline as a sort of makefile, running the pipeline is equivalent to a 
complete rebuild.  For the project that I am using to learn all of this 
stuff, this doesn't matter that much.  For the real world projects that 
I need something like this for, I fear it will potentially be a very 
large problem, and it may be that I have to have small partial pipelines 
being invoked via a makefile.  The potential benefit of streaming over 
serialization and infosets (or whatever they are called) versus 
re-parsing are unrealized in this sort of scenario.


Overall, my impression has been positive.  I am far from an expert, but 
I personally don't really care about the first five minutes.  I care 
about comprehensiveness and viability in real scenarios, not toys.  I 
don't want to bother with a technology that can handle simple toy 
examples which maybe grow into bigger things that might eventually hit a 
brick wall because I was a "newbie" and couldn't be bothered to learn.

Paul Mensonides

Received on Tuesday, 18 February 2014 09:16:56 UTC