manifest based processing from James Fuller on 2014-02-19 (xproc-dev@w3.org from February 2014)

From: James Fuller <jim@webcomposite.com>
Date: Wed, 19 Feb 2014 10:25:44 +0100
To: XProc Dev <xproc-dev@w3.org>
Message-ID: <CAEaz5mtYxB7XMWuc3s2_ZWhP-9nP28CtxpdkaRx_g4ammsB1=g@mail.gmail.com>

A common idiom used in XProc is to define a manifest of
documents/assets to work on and have that flow through the pipeline vs
data documents flowing through.

Typically, its a collection of URI's that each require a pipeline of
processing for each different content type / data type which then gets
aggregated up into some final result structure.

This approach sometimes leads to convoluted 'procedural' pipelines ...
which are less reusable and harder to comprehend.

Even with non-xml data flowing through (as proposed for v2), for
example a zip file (EPUB), we have the same class of problem where the
zip manifest is our routing table determining processing of secondary
data assets.

I would like to dig deeper into how we might be able to make life
easier with these kind of pipelines

Imagine passing a sequence of uris to a pipeline as primary input; the
pipeline's main responsibility is to deal with end result of
processing (serialisation, etc) where each individual content type is
processed by a separate pipeline.

I can imagine a lot of ways of building this kind of thing with XProc
v1 (and have) but wondering what could we enhance/add to vnext to
simplify, making things easier to (re)use ? The problems I see are;

* how to deal with mapping a step/pipeline to a content type ?
* default posture - mutation in place vs copy of data ?
* dependencies - some uris need to be processed before others

there are other issues that need thinking through but thought I would
'toss over the wall' to solicit opinion.

Jim Fuller

Received on Wednesday, 19 February 2014 09:26:12 UTC