- From: Frank Steimke <fsteimke.hb@gmail.com>
- Date: Wed, 25 Nov 2020 08:17:15 +0100
- To: Conal Tuohy <conal.tuohy@gmail.com>
- Cc: XProc Dev <xproc-dev@w3.org>
- Message-ID: <57aa8ff8-fea0-a4fa-782a-cab0673cbc5f@gmail.com>
Thank you very much, Conal. That was exactly what i hoped for. I wanted to try immediatly, but now i have a problem with the configuration of metadata-extractor (NoClassDefFound). When this is solved, i will report about the solution here in this list. Thanks again, Frank Am 25.11.2020 um 06:48 schrieb Conal Tuohy: > Frank, I suggest you look at the "viewport" compound step in XProc: > https://www.w3.org/TR/xproc/#p.viewport > <https://www.w3.org/TR/xproc/#p.viewport> > > This step is often used to apply a pipeline repeatedly to modify parts > of a larger document, or set of documents. The viewport step is used > to execute a pipeline repeatedly; it identifies portions of a document > which match a pattern, extracts those sub-trees, and applies the > sub-pipeline to them. The result of the pipeline then replaces the > original sub-tree. > > In your sub-pipeline, you would be dealing with a single imagedata > element; you would use the metadata extractor step to generate the > metadata document, and then copy the width and depth metadata from the > c:metadata document to attributes of the docbook imagedata element. > There are a few ways to do a "merge" like that; my preference is > usually to aggregate the two documents being merged using the > wrap-sequence step, and then use an XSLT to generate the new imagedata > document. > > NB the output of the "viewport" step's sub-pipeline should be an > <imagedata> element (with dimension attributes), which will replace > the original <imagedata> element. > > I've quickly written a rough draft of how this would work; I hope it's > helpful. > > Conal > > > > <p:load href="my-document.xml"/> > <p:viewport name="images-without-dimension" > match="imagedata[not(@width)]"><!-- apply sub-pipeline to sub-trees > which are imagedata elements without dimension metadata --> > <cx:metadata-extractor name="image-metadata"> > <p:with-option name="href" select="/imagedata/@fileref"/> > <cx:metadata-extractor> > <!-- now merge the metadata into the subtree; personally I would > wrap the metadata and the subtree, and use an XSLT --> > <p:wrap-sequence wrapper="imagedata-and-metadata"> > <p:input port="source"> > <p:pipe step="images-without-dimension port="current"/><!-- > the currently matching sub-tree --> > <p:pipe step="image-metadata" port="result"/> > <p:input/> > </p:wrap-sequence> > <!-- produces e.g. > <imagedata-and-metadata> > <imagedata href="blah.jpg"/> > <c:metadata> > <c:tag name="Image Width">1000 pixels</c:tag> > ... > </c:metadata> > </imagedata-and-metadata> > --> > <p:xslt name="imagedata-and-metadata-to-enhanced-imagedata"> > <p:input port="stylesheet"> > <p:document href="merge-metadata.xsl"/> > </p:input> > <p:input port="parameters"><p:empty/></p:input> > </p:xslt> > <!-- produces e.g. > <imagedata href="blah.jpg" width="1000 pixels"/> > --> > </p:viewport> > <p:store href="output/my-document.xml"/> > > On Wed, 25 Nov 2020 at 14:32, Frank Steimke <fsteimke.hb@gmail.com > <mailto:fsteimke.hb@gmail.com>> wrote: > > Dear XProc Dev, > > i am new to XProc. I need advice if and how the following problem > could be solved with XProc. Here especially with XProc 1 and the > Calabash 1.2 engine, since this is shipped with Oxygen. > > Problem: I have DocBook Documents with images referenced from an > uri as value of a imagedata/@fileref attribute. The imagedata > Element may contain optional attributes for image Dimensions. For > further processing, i need to know the Dimensions. So the task is: > find imagedata elements without @width and @depth attributes and > add these attributes. I wrote a simple XSLT Skript for this, which > makes use of an extension function in Java to get the image > intrinsic Dimension. All this is part of a larger project which > converts from DocBook to ODF. > > Now i'd like to port my Project from XSLT-Only to XProc1 with > Calabash 1.2. For some reasons which are specific to Oxygen i have > Problems using my Java extension function. I wonder if there is a > solution which uses only XProc and Calabash built-in Extensions. I > found the xs:metadata-extractor extension (see > https://xmlcalabash.com/docs/reference/cx-metadata-extractor.html > <https://xmlcalabash.com/docs/reference/cx-metadata-extractor.html>) > with this signature: > > |<p:declare-step| |type||="||cx:metadata-extractor||"| > |xmlns:cx||="||http://xmlcalabash.com/ns/extensions > <http://xmlcalabash.com/ns/extensions>||"||>| > |<p:output| |port||="||result||"||/>| > |<p:option| |name||="||href||"||/>||||||| > |</p:declare-step>| > > |I understand that this step expects an URI and will produce an > ||<c:metadata .../> XML Document.| > > |So my Question to this List is, if and how the following would be > possible:| > > |1) XSLT Transformation Step which reads my DocBook Document, > identifies imagedata Elements without Dimensions and writes out > the @fileref attribute. The Result of this Step is a Sequence of > URIs.| > > |2) For each of this URIs call the cx:metadata-extractor > Extension. The Result of this Step is a Sequence of XML Documents, > one for every URI, each with <cx:metadata /> as root.| > > |3) Wrap these Documents into a single one. The Result of this > Step is one XML Document with some <cx:metadata /> Elements.| > > |4) XSLT Transformation Step which merges this new Document with > the DocBook Document. For every imagedata Element, get the > Dimension from the Metadata Document and add the @width and @depth > Attributes. The Result of this Step is a DocBook XML Document, > where every imagedata Element has the necessary Dimension Attributes.| > > |Since i have experience with XSLT, Steps 1 and 4 shoult be no > problem. But how could i manage Steps 2 and 3? > | > > || > > | > | > > > > -- > Conal Tuohy > http://conaltuohy.com/ <http://conaltuohy.com/> > @conal_tuohy > +61-466-324297
Received on Wednesday, 25 November 2020 07:17:31 UTC