- From: Conal Tuohy <conal.tuohy@gmail.com>
- Date: Wed, 25 Nov 2020 15:48:59 +1000
- To: Frank Steimke <fsteimke.hb@gmail.com>
- Cc: XProc Dev <xproc-dev@w3.org>
- Message-ID: <CAErBQuQsMr5piqOxp9BLsiwcjDB4+HfXfAPRsKXBzCWTm5Tixw@mail.gmail.com>
Frank, I suggest you look at the "viewport" compound step in XProc: https://www.w3.org/TR/xproc/#p.viewport This step is often used to apply a pipeline repeatedly to modify parts of a larger document, or set of documents. The viewport step is used to execute a pipeline repeatedly; it identifies portions of a document which match a pattern, extracts those sub-trees, and applies the sub-pipeline to them. The result of the pipeline then replaces the original sub-tree. In your sub-pipeline, you would be dealing with a single imagedata element; you would use the metadata extractor step to generate the metadata document, and then copy the width and depth metadata from the c:metadata document to attributes of the docbook imagedata element. There are a few ways to do a "merge" like that; my preference is usually to aggregate the two documents being merged using the wrap-sequence step, and then use an XSLT to generate the new imagedata document. NB the output of the "viewport" step's sub-pipeline should be an <imagedata> element (with dimension attributes), which will replace the original <imagedata> element. I've quickly written a rough draft of how this would work; I hope it's helpful. Conal <p:load href="my-document.xml"/> <p:viewport name="images-without-dimension" match="imagedata[not(@width)]"><!-- apply sub-pipeline to sub-trees which are imagedata elements without dimension metadata --> <cx:metadata-extractor name="image-metadata"> <p:with-option name="href" select="/imagedata/@fileref"/> <cx:metadata-extractor> <!-- now merge the metadata into the subtree; personally I would wrap the metadata and the subtree, and use an XSLT --> <p:wrap-sequence wrapper="imagedata-and-metadata"> <p:input port="source"> <p:pipe step="images-without-dimension port="current"/><!-- the currently matching sub-tree --> <p:pipe step="image-metadata" port="result"/> <p:input/> </p:wrap-sequence> <!-- produces e.g. <imagedata-and-metadata> <imagedata href="blah.jpg"/> <c:metadata> <c:tag name="Image Width">1000 pixels</c:tag> ... </c:metadata> </imagedata-and-metadata> --> <p:xslt name="imagedata-and-metadata-to-enhanced-imagedata"> <p:input port="stylesheet"> <p:document href="merge-metadata.xsl"/> </p:input> <p:input port="parameters"><p:empty/></p:input> </p:xslt> <!-- produces e.g. <imagedata href="blah.jpg" width="1000 pixels"/> --> </p:viewport> <p:store href="output/my-document.xml"/> On Wed, 25 Nov 2020 at 14:32, Frank Steimke <fsteimke.hb@gmail.com> wrote: > Dear XProc Dev, > > i am new to XProc. I need advice if and how the following problem could be > solved with XProc. Here especially with XProc 1 and the Calabash 1.2 > engine, since this is shipped with Oxygen. > > Problem: I have DocBook Documents with images referenced from an uri as > value of a imagedata/@fileref attribute. The imagedata Element may contain > optional attributes for image Dimensions. For further processing, i need to > know the Dimensions. So the task is: find imagedata elements without @width > and @depth attributes and add these attributes. I wrote a simple XSLT > Skript for this, which makes use of an extension function in Java to get > the image intrinsic Dimension. All this is part of a larger project which > converts from DocBook to ODF. > > Now i'd like to port my Project from XSLT-Only to XProc1 with Calabash > 1.2. For some reasons which are specific to Oxygen i have Problems using my > Java extension function. I wonder if there is a solution which uses only > XProc and Calabash built-in Extensions. I found the xs:metadata-extractor > extension (see > https://xmlcalabash.com/docs/reference/cx-metadata-extractor.html) with > this signature: > > <p:declare-step type="cx:metadata-extractor" xmlns:cx=" > http://xmlcalabash.com/ns/extensions"> > <p:output port="result"/> > <p:option name="href"/> > </p:declare-step> > > I understand that this step expects an URI and will produce an <c:metadata > .../> XML Document. > > So my Question to this List is, if and how the following would be possible: > > 1) XSLT Transformation Step which reads my DocBook Document, identifies > imagedata Elements without Dimensions and writes out the @fileref > attribute. The Result of this Step is a Sequence of URIs. > > 2) For each of this URIs call the cx:metadata-extractor Extension. The > Result of this Step is a Sequence of XML Documents, one for every URI, each > with <cx:metadata /> as root. > > 3) Wrap these Documents into a single one. The Result of this Step is one > XML Document with some <cx:metadata /> Elements. > > 4) XSLT Transformation Step which merges this new Document with the > DocBook Document. For every imagedata Element, get the Dimension from the > Metadata Document and add the @width and @depth Attributes. The Result of > this Step is a DocBook XML Document, where every imagedata Element has the > necessary Dimension Attributes. > > Since i have experience with XSLT, Steps 1 and 4 shoult be no problem. But > how could i manage Steps 2 and 3? > > > -- Conal Tuohy http://conaltuohy.com/ @conal_tuohy +61-466-324297
Received on Wednesday, 25 November 2020 05:49:24 UTC