Re: The first five minutes ... a thought experiment (long) from James Fuller on 2014-02-19 (xproc-dev@w3.org from February 2014)

From: James Fuller <jim@webcomposite.com>
Date: Wed, 19 Feb 2014 19:19:15 +0100
To: "pmenso57 ." <pmenso57@comcast.net>
Cc: XProc Dev <xproc-dev@w3.org>
Message-ID: <CAEaz5mtKp6=8_pPF5NUZHgxK+vaUVxnM_Ve99x5JzMRZ4URxkg@mail.gmail.com>
in my last email, where I say

'behave like a p:sink, with no output flowing out of p:store, the next'

I meant

'behave like a p:sink, with no primary output flowing out of p:store, the next'

thx to Olivier Jeulin for the correction

J

On 19 February 2014 10:42, James Fuller <jim@webcomposite.com> wrote:
> On 19 February 2014 03:48,  <pmenso57@comcast.net> wrote:
>> ----- Original Message -----
>>> From: "James Fuller" <jim@webcomposite.com>
>>> To: "Paul Mensonides" <pmenso57@comcast.net>
>> The p:store step implicitly pipes the "preprocess" step's "result" output to its own "source" input.
>>
>> The "transform" step (apparently) implicitly pipes the pipeline's implicit "parameters" input to its own "parameters" input.  However, it does _not_ implicitly pipe the last "result" output (i.e. from the "preprocess" step) to its own "source" input.  I.e. if I comment out the explicit pipe...
>>
>>     <p:xslt name="transform" version="2.0">
>>         <!--<p:input port="source">
>>             <p:pipe step="preprocess" port="result"/>
>>         </p:input>-->
>>         <p:input port="stylesheet">
>>             <p:document href="data.xsl"/>
>>         </p:input>
>>     </p:xslt>
>>
>> ...the pipeline fails.  What it looks like to me is that the "parameters" input of the pipeline can be implicitely piped more than once, but not normal inputs.  So I put in the explicit pipe while thinking to myself: this type of branching happens so often that I might as well just specify the bindings myself because apparently primary input ports do not automatically map to the first available primary output port, and, even if they did, that would be brittle.  E.g. the above is basically:
>>
>
> The problem is that p:store has no result output port, which makes it
> behave like a p:sink, with no output flowing out of p:store, the next
> step needs to explicitly define bindings. This can be useful in some
> classes of pipelines, but is a classic gotcha for simple usage.
>
> I think we need to revisit the idea of steps with no default output
> ports, it results in unnec convoluted pipelines for common scenarios.
>
>>
>> One of the things that I need something like XProc for is to generate a bunch documentation.  The HTML output ends up being roughly 1000 separate HTML files.  The source data starts with an XML file that is a manifest referencing other source XML files.  One of the required steps is to go through all of these files and generate a lookup table for cross-referencing.  That process may also generate a bootstrapped XML Schema which is imported into the various schemas for the documents themselves.  After that initial step, documents are validated and transformed in various ways (using the lookup table, etc.).  My current setup of this--which was from the XSLT 1.0 days with and several Bash scripts (including some being generated by XSLT)--takes about 5-10 minutes to regenerate the documentation (depending on computer of course) and it would be worse if it was also generating LaTeX or (unfamiliar to me) XSL-FO.  So, I would like to be able to avoid rebuilding those things which are not affected by the changes to the input.  Normally, I'd just immediately go to makefile, but that just uses timestamps.  The potential benefit with using XML is that I should be able to determine whether particular changes *inside* a document actually effect other documents and thus cause them to be recreated.  I.e. with make, the lookup table created from all of the files, so if any file is changed, everything has to be rebuilt because everything depends on that lookup table.
>>
>> IOW, a pipeline is essentially a build process.  In particular when it comes to XML data of various kinds, there are several potential benefits over a generic build tool like make.  One is streaming rather than serializing.  This is particularly true if data flowing between steps does not need to be complete documents (i.e. among other things, that provides the possibility for the pipeline author to "help" the system recognize streamability of various subprocesses).  The other is potentially about having finer granularity in determining whether a change in a dependency actually affects subsequent downstream steps and divert around them or skip them if not.  Technically you _can_ do this with a makefile, it just isn't very natural (and it isn't very natural with current XProc either).
>>
>
> thank you for the reminder ... this is back on our radar ... eg.
> change/timestamp based processing
>
> Jim Fuller
Received on Wednesday, 19 February 2014 18:19:43 UTC