Re: Get an merge image metadata from Imsieke, Gerrit, le-tex on 2020-11-25 (xproc-dev@w3.org from November 2020)

From: Imsieke, Gerrit, le-tex <gerrit.imsieke@le-tex.de>
Date: Wed, 25 Nov 2020 08:32:44 +0100
To: xproc-dev@w3.org
Message-ID: <c27e9cc7-f25c-9fec-1667-1b8acdd13b80@le-tex.de>
Its jar needs to be downloaded separately and put on the classpath: 
https://github.com/ndw/xmlcalabash1-metadata-extractor

On 25.11.2020 08:17, Frank Steimke wrote:
> Thank you very much, Conal. That was exactly what i hoped for.
> 
> I wanted to try immediatly, but now i have a problem with the 
> configuration of metadata-extractor (NoClassDefFound). When this is 
> solved, i will report about the solution here in this list.
> 
> Thanks again,
> Frank
> 
> Am 25.11.2020 um 06:48 schrieb Conal Tuohy:
>> Frank, I suggest you look at the "viewport" compound step in XProc: 
>> https://www.w3.org/TR/xproc/#p.viewport 
>> <https://www.w3.org/TR/xproc/#p.viewport>
>>
>> This step is often used to apply a pipeline repeatedly to modify parts 
>> of a larger document, or set of documents. The viewport step is used 
>> to execute a pipeline repeatedly; it identifies portions of a document 
>> which match a pattern, extracts those sub-trees, and applies the 
>> sub-pipeline to them. The result of the pipeline then replaces the 
>> original sub-tree.
>>
>> In your sub-pipeline, you would be dealing with a single imagedata 
>> element; you would use the metadata extractor step to generate the 
>> metadata document, and then copy the width and depth metadata from the 
>> c:metadata document to attributes of the docbook imagedata element. 
>> There are a few ways to do a "merge" like that; my preference is 
>> usually to aggregate the two documents being merged using the 
>> wrap-sequence step, and then use an XSLT to generate the new imagedata 
>> document.
>>
>> NB the output of the "viewport" step's sub-pipeline should be an 
>> <imagedata> element (with dimension attributes), which will replace 
>> the original <imagedata> element.
>>
>> I've quickly written a rough draft of how this would work; I hope it's 
>> helpful.
>>
>> Conal
>>
>>
>>
>> <p:load href="my-document.xml"/>
>> <p:viewport name="images-without-dimension" 
>> match="imagedata[not(@width)]"><!-- apply sub-pipeline to sub-trees 
>> which are imagedata elements without dimension metadata -->
>>    <cx:metadata-extractor name="image-metadata">
>>       <p:with-option name="href" select="/imagedata/@fileref"/>
>>    <cx:metadata-extractor>
>>    <!-- now merge the metadata into the subtree; personally I would 
>> wrap the metadata and the subtree, and use an XSLT -->
>>    <p:wrap-sequence wrapper="imagedata-and-metadata">
>>       <p:input port="source">
>>          <p:pipe step="images-without-dimension port="current"/><!-- 
>> the currently matching sub-tree -->
>>          <p:pipe step="image-metadata" port="result"/>
>>       <p:input/>
>>    </p:wrap-sequence>
>>    <!-- produces e.g.
>>    <imagedata-and-metadata>
>>       <imagedata href="blah.jpg"/>
>>       <c:metadata>
>>          <c:tag name="Image Width">1000 pixels</c:tag>
>>          ...
>>       </c:metadata>
>>    </imagedata-and-metadata>
>>    -->
>>    <p:xslt name="imagedata-and-metadata-to-enhanced-imagedata">
>>       <p:input port="stylesheet">
>>          <p:document href="merge-metadata.xsl"/>
>>       </p:input>
>>       <p:input port="parameters"><p:empty/></p:input>
>>    </p:xslt>
>>    <!-- produces e.g.
>>       <imagedata href="blah.jpg" width="1000 pixels"/>
>>    -->
>> </p:viewport>
>> <p:store href="output/my-document.xml"/>
>>
>> On Wed, 25 Nov 2020 at 14:32, Frank Steimke <fsteimke.hb@gmail.com 
>> <mailto:fsteimke.hb@gmail.com>> wrote:
>>
>>     Dear XProc Dev,
>>
>>     i am new to XProc. I need advice if and how the following problem
>>     could be solved with XProc. Here especially with XProc 1 and the
>>     Calabash 1.2 engine, since this is shipped with Oxygen.
>>
>>     Problem: I have DocBook Documents with images referenced from an
>>     uri as value of a imagedata/@fileref attribute. The imagedata
>>     Element may contain optional attributes for image Dimensions. For
>>     further processing, i need to know the Dimensions. So the task is:
>>     find imagedata elements without @width and @depth attributes and
>>     add these attributes. I wrote a simple XSLT Skript for this, which
>>     makes use of an extension function in Java to get the image
>>     intrinsic Dimension. All this is part of a larger project which
>>     converts from DocBook to ODF.
>>
>>     Now i'd like to port my Project from XSLT-Only to XProc1 with
>>     Calabash 1.2. For some reasons which are specific to Oxygen i have
>>     Problems using my Java extension function. I wonder if there is a
>>     solution which uses only XProc and Calabash built-in Extensions. I
>>     found the xs:metadata-extractor extension (see
>>     https://xmlcalabash.com/docs/reference/cx-metadata-extractor.html
>>     <https://xmlcalabash.com/docs/reference/cx-metadata-extractor.html>)
>>     with this signature:
>>
>>     |<p:declare-step| |type||="||cx:metadata-extractor||"|
>>     |xmlns:cx||="||http://xmlcalabash.com/ns/extensions
>>     <http://xmlcalabash.com/ns/extensions>||"||>|
>>     |<p:output| |port||="||result||"||/>|
>>     |<p:option| |name||="||href||"||/>|||||||
>>     |</p:declare-step>|
>>
>>     |I understand that this step expects an URI and will produce an
>>     ||<c:metadata .../> XML Document.|
>>
>>     |So my Question to this List is, if and how the following would be
>>     possible:|
>>
>>     |1) XSLT Transformation Step which reads my DocBook Document,
>>     identifies imagedata Elements without Dimensions and writes out
>>     the @fileref attribute. The Result of this Step is a Sequence of
>>     URIs.|
>>
>>     |2) For each of this URIs call the cx:metadata-extractor
>>     Extension. The Result of this Step is a Sequence of XML Documents,
>>     one for every URI, each with <cx:metadata /> as root.|
>>
>>     |3) Wrap these Documents into a single one. The Result of this
>>     Step is one XML Document with some <cx:metadata /> Elements.|
>>
>>     |4) XSLT Transformation Step which merges this new Document with
>>     the DocBook Document. For every imagedata Element, get the
>>     Dimension from the Metadata Document and add the @width and @depth
>>     Attributes. The Result of this Step is a DocBook XML Document,
>>     where every imagedata Element has the necessary Dimension Attributes.|
>>
>>     |Since i have experience with XSLT, Steps 1 and 4 shoult be no
>>     problem. But how could i manage Steps 2 and 3?
>>     |
>>
Received on Wednesday, 25 November 2020 07:33:00 UTC