- From: Jostein Austvik Jacobsen <josteinaj@gmail.com>
- Date: Wed, 19 Feb 2014 12:35:58 +0100
- To: James Fuller <jim@webcomposite.com>
- Cc: XProc Dev <xproc-dev@w3.org>
- Message-ID: <CAOCxfQedc9TWps7CMHkB+DM_sD_4D1BRiwzDWiz6O8_8pNspWw@mail.gmail.com>
So a pattern we're using is to provide the manifest (or "fileset") on the primary input/output port, and "in-memory" documents on a secondary input/output port. (Sometimes I also want to generate a report, in which case another secondary input/output port is used.) So most steps in our library are implemented with a signature similar to this: *<p:declare-step name="main" ...>* * <p:input port="fileset.in <http://fileset.in>" primary="true"/>* * <p:input port="in-memory.in <http://in-memory.in>" sequence="true"/>* * <p:output port="fileset.out" primary="true"/> * * <p:output port="in-memory.out" sequence="true"/>* *</p:declare-step>* This makes it relatively easy to connect multiple steps, although only one of the ports can have a default connection (the fileset in this case), and the rest will have to be explicitly connected: *<p:declare-step ...>* * <p:documentation>Convert from HTML to EPUB and validate input/output.</p:documentation>* * <p:option name="input-html-href" required="true"/>* * <p:option name="output-epub-href" required="true"/>* * <px:html-load name="html-load">* * <p:with-option name="href" select="$input-html-href"/>* * </px:html-load>* * <px:html-validate name="html-validate">* * <p:input port="in-memory.in <http://in-memory.in>">* * <p:input port="in-memory.out" step="html-load"/>* * </p:input>* * </px:html-validate>* * <px:html-to-epub name="html-to-epub">* * <p:input port="in-memory.in <http://in-memory.in>">* * <p:input port="in-memory.out" step="html-validate"/>* * </p:input>* * </px:html-to-epub>* * <px:epub-validate name="epub-validate">* * <p:input port="in-memory.in <http://in-memory.in>">* * <p:input port="in-memory.out" step="html-to-epub"/>* * </p:input>* * </px:epub-validate>* * <px:epub-store name="epub-store">* * <p:input port="in-memory.in <http://in-memory.in>">* * <p:input port="in-memory.out" step="epub-validate"/>* * </p:input>* * <p:with-option name="href" select="$output-epub-href"/>* * </px:epub-store>* *</p:declare-step>* It would be useful if the "kind" attribute were more flexible. I think this has been suggested before (by Romain?). If custom kinds were allowed, then multiple ports could be primary: *<p:declare-step name="main" ...>* * <!-- "primary" attributes added for verbosity, ports would be primary by default since they are the only ones of their kind -->* * <p:input port="fileset.in <http://fileset.in>" primary="true"/>* * <p:input port="in-memory.in <http://in-memory.in>" sequence="true" primary="true" kind="in-memory"/>* * <p:output port="fileset.out" primary="true"/>* * <p:output port="in-memory.out" sequence="true" primary="true" kind="in-memory"/>* *</p:declare-step>* This would greatly reduce the size of the pipeline: *<p:declare-step ...>* * <p:documentation>Convert from HTML to EPUB and validate input/output.</p:documentation>* * <p:option name="input-html-href" required="true"/>* * <p:option name="output-epub-href" required="true"/>* * <px:html-load name="html-load">* * <p:with-option name="href" select="$input-html-href"/>* * </px:html-load>* * <px:html-validate name="html-validate"/>* * <px:html-to-epub name="html-to-epub"/>* * <px:epub-validate name="epub-validate"/>* * <px:epub-store name="epub-store">* * <p:with-option name="href" select="$output-epub-href"/>* * </px:epub-store>* *</p:declare-step>* Jostein On 19 February 2014 10:25, James Fuller <jim@webcomposite.com> wrote: > A common idiom used in XProc is to define a manifest of > documents/assets to work on and have that flow through the pipeline vs > data documents flowing through. > > Typically, its a collection of URI's that each require a pipeline of > processing for each different content type / data type which then gets > aggregated up into some final result structure. > > This approach sometimes leads to convoluted 'procedural' pipelines ... > which are less reusable and harder to comprehend. > > Even with non-xml data flowing through (as proposed for v2), for > example a zip file (EPUB), we have the same class of problem where the > zip manifest is our routing table determining processing of secondary > data assets. > > I would like to dig deeper into how we might be able to make life > easier with these kind of pipelines > > Imagine passing a sequence of uris to a pipeline as primary input; the > pipeline's main responsibility is to deal with end result of > processing (serialisation, etc) where each individual content type is > processed by a separate pipeline. > > I can imagine a lot of ways of building this kind of thing with XProc > v1 (and have) but wondering what could we enhance/add to vnext to > simplify, making things easier to (re)use ? The problems I see are; > > * how to deal with mapping a step/pipeline to a content type ? > * default posture - mutation in place vs copy of data ? > * dependencies - some uris need to be processed before others > > there are other issues that need thinking through but thought I would > 'toss over the wall' to solicit opinion. > > Jim Fuller > >
Received on Wednesday, 19 February 2014 11:36:48 UTC