Re: The first five minutes ... a thought experiment (long) from James Fuller on 2014-02-18 (xproc-dev@w3.org from February 2014)

From: James Fuller <jim@webcomposite.com>
Date: Tue, 18 Feb 2014 12:46:49 +0100
To: Paul Mensonides <pmenso57@comcast.net>
Cc: XProc Dev <xproc-dev@w3.org>
Message-ID: <CAEaz5mt1jtcopyJBuzWx_=9iG3m+XQ_gOt9DeQMr2gmsMp-FsA@mail.gmail.com>
On 17 February 2014 23:21, Paul Mensonides <pmenso57@comcast.net> wrote:
> 2c
>
> As someone that just recently started using XProc (and learning XML Schema
> 1.1 and XSLT/XPath 2.0 at the same time), I have not had a particularly hard
> time figuring it out.

> There are some things that I have found bizarre.  In particular, why are
> "documents" flowing through pipes rather that just (e.g.) XPath data values
> (of which documents are just one form).  Related to that, documents and
> parameters (for e.g. XSLT) are all more or less XPath data values, and it
> would be nice if there was a uniform way that such data could be passed
> around.  But that hasn't been difficult to figure out, just noted.

Documents flowing through a pipeline is a fundamental concept in xproc
eg. data flowing through pipe whose connections to steps are defined
by bindings. This is classic data flow language, though the decision
in v1 was to only allow XML documents flow through.

In XProc vnext we are considering allowing item()* with non xml
documents flowing through pipes, which would address your requirement
(I think)

> The thing that has frustrated me in particular is the way that versioning is
> handled throughout many of the XML-related standards. XProc does some things
> right: ability to specify required XPath version and required XSLT version.
> But no version requirement for XML Schema? No schema-aware XSLT
> requirements?  What about branching off of these values?  What I ended up
> doing (since I needed a pipeline that could be executed with and without
> schema support) was create an option for this AND pass around Saxon
> configuration files.  E.g. this is my invocation:
>
> java com.xmlcalabash.drivers.Main --saxon-configuration=saxon-ee.xml
> pipeline.xpl schema-aware=yes

good point, and yes its part of a broader set of issues with
versioning ... will make sure we discuss this

> So far I have found the default piping stuff more bewildering than
> helpful--mostly because it hides what is actually happening.  I have just
> started using XProc.  The reason for doing so is because I have processing
> tasks that aren't just a simple linear pipelines.  If I just wanted to take
> some XML data through XSLT, I wouldn't bother to set up a pipeline.  So, at
> this point, I just explicitly specify all connections between ports.

yes explicitly setting up a pipeline removes doubt ... which is
counter to usability

we will have a much better defaulting story with optimized syntax
changes that should address issues in this area in vnext.

> I haven't run across it yet, but I am worried about the lack of the ability
> to cache intermediate results in a direct way.  Viewing a pipeline as a sort
> of makefile, running the pipeline is equivalent to a complete rebuild.  For
> the project that I am using to learn all of this stuff, this doesn't matter
> that much.  For the real world projects that I need something like this for,
> I fear it will potentially be a very large problem, and it may be that I
> have to have small partial pipelines being invoked via a makefile.  The
> potential benefit of streaming over serialization and infosets (or whatever
> they are called) versus re-parsing are unrealized in this sort of scenario.

vnext calls for better logging and debugging of pipelines ... any
examples past wanting to log output of a step appreciated.

> Overall, my impression has been positive.  I am far from an expert, but I
> personally don't really care about the first five minutes.  I care about
> comprehensiveness and viability in real scenarios, not toys.  I don't want
> to bother with a technology that can handle simple toy examples which maybe
> grow into bigger things that might eventually hit a brick wall because I was
> a "newbie" and couldn't be bothered to learn.

we also know that the first five minutes is slightly artificial in
terms of entire lifecycle of any software usage ... but in XProc's
case there are several things that could be optimized, as I mentioned
in other responses, I will be talking about 1d, 1m and 1 year
scenarios as well in the future.

thx for the comments, keep em coming, Jim Fuller
Received on Tuesday, 18 February 2014 11:47:17 UTC