- From: Imsieke, Gerrit, le-tex <gerrit.imsieke@le-tex.de>
- Date: Wed, 25 Nov 2020 08:32:44 +0100
- To: xproc-dev@w3.org
Its jar needs to be downloaded separately and put on the classpath: https://github.com/ndw/xmlcalabash1-metadata-extractor On 25.11.2020 08:17, Frank Steimke wrote: > Thank you very much, Conal. That was exactly what i hoped for. > > I wanted to try immediatly, but now i have a problem with the > configuration of metadata-extractor (NoClassDefFound). When this is > solved, i will report about the solution here in this list. > > Thanks again, > Frank > > Am 25.11.2020 um 06:48 schrieb Conal Tuohy: >> Frank, I suggest you look at the "viewport" compound step in XProc: >> https://www.w3.org/TR/xproc/#p.viewport >> <https://www.w3.org/TR/xproc/#p.viewport> >> >> This step is often used to apply a pipeline repeatedly to modify parts >> of a larger document, or set of documents. The viewport step is used >> to execute a pipeline repeatedly; it identifies portions of a document >> which match a pattern, extracts those sub-trees, and applies the >> sub-pipeline to them. The result of the pipeline then replaces the >> original sub-tree. >> >> In your sub-pipeline, you would be dealing with a single imagedata >> element; you would use the metadata extractor step to generate the >> metadata document, and then copy the width and depth metadata from the >> c:metadata document to attributes of the docbook imagedata element. >> There are a few ways to do a "merge" like that; my preference is >> usually to aggregate the two documents being merged using the >> wrap-sequence step, and then use an XSLT to generate the new imagedata >> document. >> >> NB the output of the "viewport" step's sub-pipeline should be an >> <imagedata> element (with dimension attributes), which will replace >> the original <imagedata> element. >> >> I've quickly written a rough draft of how this would work; I hope it's >> helpful. >> >> Conal >> >> >> >> <p:load href="my-document.xml"/> >> <p:viewport name="images-without-dimension" >> match="imagedata[not(@width)]"><!-- apply sub-pipeline to sub-trees >> which are imagedata elements without dimension metadata --> >> <cx:metadata-extractor name="image-metadata"> >> <p:with-option name="href" select="/imagedata/@fileref"/> >> <cx:metadata-extractor> >> <!-- now merge the metadata into the subtree; personally I would >> wrap the metadata and the subtree, and use an XSLT --> >> <p:wrap-sequence wrapper="imagedata-and-metadata"> >> <p:input port="source"> >> <p:pipe step="images-without-dimension port="current"/><!-- >> the currently matching sub-tree --> >> <p:pipe step="image-metadata" port="result"/> >> <p:input/> >> </p:wrap-sequence> >> <!-- produces e.g. >> <imagedata-and-metadata> >> <imagedata href="blah.jpg"/> >> <c:metadata> >> <c:tag name="Image Width">1000 pixels</c:tag> >> ... >> </c:metadata> >> </imagedata-and-metadata> >> --> >> <p:xslt name="imagedata-and-metadata-to-enhanced-imagedata"> >> <p:input port="stylesheet"> >> <p:document href="merge-metadata.xsl"/> >> </p:input> >> <p:input port="parameters"><p:empty/></p:input> >> </p:xslt> >> <!-- produces e.g. >> <imagedata href="blah.jpg" width="1000 pixels"/> >> --> >> </p:viewport> >> <p:store href="output/my-document.xml"/> >> >> On Wed, 25 Nov 2020 at 14:32, Frank Steimke <fsteimke.hb@gmail.com >> <mailto:fsteimke.hb@gmail.com>> wrote: >> >> Dear XProc Dev, >> >> i am new to XProc. I need advice if and how the following problem >> could be solved with XProc. Here especially with XProc 1 and the >> Calabash 1.2 engine, since this is shipped with Oxygen. >> >> Problem: I have DocBook Documents with images referenced from an >> uri as value of a imagedata/@fileref attribute. The imagedata >> Element may contain optional attributes for image Dimensions. For >> further processing, i need to know the Dimensions. So the task is: >> find imagedata elements without @width and @depth attributes and >> add these attributes. I wrote a simple XSLT Skript for this, which >> makes use of an extension function in Java to get the image >> intrinsic Dimension. All this is part of a larger project which >> converts from DocBook to ODF. >> >> Now i'd like to port my Project from XSLT-Only to XProc1 with >> Calabash 1.2. For some reasons which are specific to Oxygen i have >> Problems using my Java extension function. I wonder if there is a >> solution which uses only XProc and Calabash built-in Extensions. I >> found the xs:metadata-extractor extension (see >> https://xmlcalabash.com/docs/reference/cx-metadata-extractor.html >> <https://xmlcalabash.com/docs/reference/cx-metadata-extractor.html>) >> with this signature: >> >> |<p:declare-step| |type||="||cx:metadata-extractor||"| >> |xmlns:cx||="||http://xmlcalabash.com/ns/extensions >> <http://xmlcalabash.com/ns/extensions>||"||>| >> |<p:output| |port||="||result||"||/>| >> |<p:option| |name||="||href||"||/>||||||| >> |</p:declare-step>| >> >> |I understand that this step expects an URI and will produce an >> ||<c:metadata .../> XML Document.| >> >> |So my Question to this List is, if and how the following would be >> possible:| >> >> |1) XSLT Transformation Step which reads my DocBook Document, >> identifies imagedata Elements without Dimensions and writes out >> the @fileref attribute. The Result of this Step is a Sequence of >> URIs.| >> >> |2) For each of this URIs call the cx:metadata-extractor >> Extension. The Result of this Step is a Sequence of XML Documents, >> one for every URI, each with <cx:metadata /> as root.| >> >> |3) Wrap these Documents into a single one. The Result of this >> Step is one XML Document with some <cx:metadata /> Elements.| >> >> |4) XSLT Transformation Step which merges this new Document with >> the DocBook Document. For every imagedata Element, get the >> Dimension from the Metadata Document and add the @width and @depth >> Attributes. The Result of this Step is a DocBook XML Document, >> where every imagedata Element has the necessary Dimension Attributes.| >> >> |Since i have experience with XSLT, Steps 1 and 4 shoult be no >> problem. But how could i manage Steps 2 and 3? >> | >>
Received on Wednesday, 25 November 2020 07:33:00 UTC