Re: Get an merge image metadata

Thank you very much, Conal. That was exactly what i hoped for.

I wanted to try immediatly, but now i have a problem with the 
configuration of metadata-extractor (NoClassDefFound). When this is 
solved, i will report about the solution here in this list.

Thanks again,
Frank

Am 25.11.2020 um 06:48 schrieb Conal Tuohy:
> Frank, I suggest you look at the "viewport" compound step in XProc: 
> https://www.w3.org/TR/xproc/#p.viewport 
> <https://www.w3.org/TR/xproc/#p.viewport>
>
> This step is often used to apply a pipeline repeatedly to modify parts 
> of a larger document, or set of documents. The viewport step is used 
> to execute a pipeline repeatedly; it identifies portions of a document 
> which match a pattern, extracts those sub-trees, and applies the 
> sub-pipeline to them. The result of the pipeline then replaces the 
> original sub-tree.
>
> In your sub-pipeline, you would be dealing with a single imagedata 
> element; you would use the metadata extractor step to generate the 
> metadata document, and then copy the width and depth metadata from the 
> c:metadata document to attributes of the docbook imagedata element. 
> There are a few ways to do a "merge" like that; my preference is 
> usually to aggregate the two documents being merged using the 
> wrap-sequence step, and then use an XSLT to generate the new imagedata 
> document.
>
> NB the output of the "viewport" step's sub-pipeline should be an 
> <imagedata> element (with dimension attributes), which will replace 
> the original <imagedata> element.
>
> I've quickly written a rough draft of how this would work; I hope it's 
> helpful.
>
> Conal
>
>
>
> <p:load href="my-document.xml"/>
> <p:viewport name="images-without-dimension" 
> match="imagedata[not(@width)]"><!-- apply sub-pipeline to sub-trees 
> which are imagedata elements without dimension metadata -->
>    <cx:metadata-extractor name="image-metadata">
>       <p:with-option name="href" select="/imagedata/@fileref"/>
>    <cx:metadata-extractor>
>    <!-- now merge the metadata into the subtree; personally I would 
> wrap the metadata and the subtree, and use an XSLT -->
>    <p:wrap-sequence wrapper="imagedata-and-metadata">
>       <p:input port="source">
>          <p:pipe step="images-without-dimension port="current"/><!-- 
> the currently matching sub-tree -->
>          <p:pipe step="image-metadata" port="result"/>
>       <p:input/>
>    </p:wrap-sequence>
>    <!-- produces e.g.
>    <imagedata-and-metadata>
>       <imagedata href="blah.jpg"/>
>       <c:metadata>
>          <c:tag name="Image Width">1000 pixels</c:tag>
>          ...
>       </c:metadata>
>    </imagedata-and-metadata>
>    -->
>    <p:xslt name="imagedata-and-metadata-to-enhanced-imagedata">
>       <p:input port="stylesheet">
>          <p:document href="merge-metadata.xsl"/>
>       </p:input>
>       <p:input port="parameters"><p:empty/></p:input>
>    </p:xslt>
>    <!-- produces e.g.
>       <imagedata href="blah.jpg" width="1000 pixels"/>
>    -->
> </p:viewport>
> <p:store href="output/my-document.xml"/>
>
> On Wed, 25 Nov 2020 at 14:32, Frank Steimke <fsteimke.hb@gmail.com 
> <mailto:fsteimke.hb@gmail.com>> wrote:
>
>     Dear XProc Dev,
>
>     i am new to XProc. I need advice if and how the following problem
>     could be solved with XProc. Here especially with XProc 1 and the
>     Calabash 1.2 engine, since this is shipped with Oxygen.
>
>     Problem: I have DocBook Documents with images referenced from an
>     uri as value of a imagedata/@fileref attribute. The imagedata
>     Element may contain optional attributes for image Dimensions. For
>     further processing, i need to know the Dimensions. So the task is:
>     find imagedata elements without @width and @depth attributes and
>     add these attributes. I wrote a simple XSLT Skript for this, which
>     makes use of an extension function in Java to get the image
>     intrinsic Dimension. All this is part of a larger project which
>     converts from DocBook to ODF.
>
>     Now i'd like to port my Project from XSLT-Only to XProc1 with
>     Calabash 1.2. For some reasons which are specific to Oxygen i have
>     Problems using my Java extension function. I wonder if there is a
>     solution which uses only XProc and Calabash built-in Extensions. I
>     found the xs:metadata-extractor extension (see
>     https://xmlcalabash.com/docs/reference/cx-metadata-extractor.html
>     <https://xmlcalabash.com/docs/reference/cx-metadata-extractor.html>)
>     with this signature:
>
>     |<p:declare-step| |type||="||cx:metadata-extractor||"|
>     |xmlns:cx||="||http://xmlcalabash.com/ns/extensions
>     <http://xmlcalabash.com/ns/extensions>||"||>|
>     |<p:output| |port||="||result||"||/>|
>     |<p:option| |name||="||href||"||/>|||||||
>     |</p:declare-step>|
>
>     |I understand that this step expects an URI and will produce an
>     ||<c:metadata .../> XML Document.|
>
>     |So my Question to this List is, if and how the following would be
>     possible:|
>
>     |1) XSLT Transformation Step which reads my DocBook Document,
>     identifies imagedata Elements without Dimensions and writes out
>     the @fileref attribute. The Result of this Step is a Sequence of
>     URIs.|
>
>     |2) For each of this URIs call the cx:metadata-extractor
>     Extension. The Result of this Step is a Sequence of XML Documents,
>     one for every URI, each with <cx:metadata /> as root.|
>
>     |3) Wrap these Documents into a single one. The Result of this
>     Step is one XML Document with some <cx:metadata /> Elements.|
>
>     |4) XSLT Transformation Step which merges this new Document with
>     the DocBook Document. For every imagedata Element, get the
>     Dimension from the Metadata Document and add the @width and @depth
>     Attributes. The Result of this Step is a DocBook XML Document,
>     where every imagedata Element has the necessary Dimension Attributes.|
>
>     |Since i have experience with XSLT, Steps 1 and 4 shoult be no
>     problem. But how could i manage Steps 2 and 3?
>     |
>
>     ||
>
>     |
>     |
>
>
>
> -- 
> Conal Tuohy
> http://conaltuohy.com/ <http://conaltuohy.com/>
> @conal_tuohy
> +61-466-324297

Received on Wednesday, 25 November 2020 07:17:31 UTC