Re: Get an merge image metadata from Conal Tuohy on 2020-11-25 (xproc-dev@w3.org from November 2020)

From: Conal Tuohy <conal.tuohy@gmail.com>
Date: Wed, 25 Nov 2020 15:48:59 +1000
To: Frank Steimke <fsteimke.hb@gmail.com>
Cc: XProc Dev <xproc-dev@w3.org>
Message-ID: <CAErBQuQsMr5piqOxp9BLsiwcjDB4+HfXfAPRsKXBzCWTm5Tixw@mail.gmail.com>
Frank, I suggest you look at the "viewport" compound step in XProc:
https://www.w3.org/TR/xproc/#p.viewport

This step is often used to apply a pipeline repeatedly to modify parts of a
larger document, or set of documents. The viewport step is used to execute
a pipeline repeatedly; it identifies portions of a document which match a
pattern, extracts those sub-trees, and applies the sub-pipeline to them.
The result of the pipeline then replaces the original sub-tree.

In your sub-pipeline, you would be dealing with a single imagedata element;
you would use the metadata extractor step to generate the metadata
document, and then copy the width and depth metadata from the c:metadata
document to attributes of the docbook imagedata element. There are a few
ways to do a "merge" like that; my preference is usually to aggregate the
two documents being merged using the wrap-sequence step, and then use an
XSLT to generate the new imagedata document.

NB the output of the "viewport" step's sub-pipeline should be an
<imagedata> element (with dimension attributes), which will replace the
original <imagedata> element.

I've quickly written a rough draft of how this would work; I hope it's
helpful.

Conal



<p:load href="my-document.xml"/>
<p:viewport name="images-without-dimension"
match="imagedata[not(@width)]"><!-- apply sub-pipeline to sub-trees which
are imagedata elements without dimension metadata -->
   <cx:metadata-extractor name="image-metadata">
      <p:with-option name="href" select="/imagedata/@fileref"/>
   <cx:metadata-extractor>
   <!-- now merge the metadata into the subtree; personally I would wrap
the metadata and the subtree, and use an XSLT -->
   <p:wrap-sequence wrapper="imagedata-and-metadata">
      <p:input port="source">
         <p:pipe step="images-without-dimension port="current"/><!-- the
currently matching sub-tree -->
         <p:pipe step="image-metadata" port="result"/>
      <p:input/>
   </p:wrap-sequence>
   <!-- produces e.g.
   <imagedata-and-metadata>
      <imagedata href="blah.jpg"/>
      <c:metadata>
         <c:tag name="Image Width">1000 pixels</c:tag>
         ...
      </c:metadata>
   </imagedata-and-metadata>
   -->
   <p:xslt name="imagedata-and-metadata-to-enhanced-imagedata">
      <p:input port="stylesheet">
         <p:document href="merge-metadata.xsl"/>
      </p:input>
      <p:input port="parameters"><p:empty/></p:input>
   </p:xslt>
   <!-- produces e.g.
      <imagedata href="blah.jpg" width="1000 pixels"/>
   -->
</p:viewport>
<p:store href="output/my-document.xml"/>

On Wed, 25 Nov 2020 at 14:32, Frank Steimke <fsteimke.hb@gmail.com> wrote:

> Dear XProc Dev,
>
> i am new to XProc. I need advice if and how the following problem could be
> solved with XProc. Here especially with XProc 1 and the Calabash 1.2
> engine, since this is shipped with Oxygen.
>
> Problem: I have DocBook Documents with images referenced from an uri as
> value of a imagedata/@fileref attribute. The imagedata Element may contain
> optional attributes for image Dimensions. For further processing, i need to
> know the Dimensions. So the task is: find imagedata elements without @width
> and @depth attributes and add these attributes. I wrote a simple XSLT
> Skript for this, which makes use of an extension function in Java to get
> the image intrinsic Dimension. All this is part of a larger project which
> converts from DocBook to ODF.
>
> Now i'd like to port my Project from XSLT-Only to XProc1 with Calabash
> 1.2. For some reasons which are specific to Oxygen i have Problems using my
> Java extension function. I wonder if there is a solution which uses only
> XProc and Calabash built-in Extensions. I found the xs:metadata-extractor
> extension (see
> https://xmlcalabash.com/docs/reference/cx-metadata-extractor.html) with
> this signature:
>
> <p:declare-step type="cx:metadata-extractor" xmlns:cx="
> http://xmlcalabash.com/ns/extensions">
>      <p:output port="result"/>
>      <p:option name="href"/>
> </p:declare-step>
>
> I understand that this step expects an URI and will produce an <c:metadata
> .../> XML Document.
>
> So my Question to this List is, if and how the following would be possible:
>
> 1) XSLT Transformation Step which reads my DocBook Document, identifies
> imagedata Elements without Dimensions and writes out the @fileref
> attribute. The Result of this Step is a Sequence of URIs.
>
> 2) For each of this URIs call the cx:metadata-extractor Extension. The
> Result of this Step is a Sequence of XML Documents, one for every URI, each
> with <cx:metadata /> as root.
>
> 3) Wrap these Documents into a single one. The Result of this Step is one
> XML Document with some <cx:metadata /> Elements.
>
> 4) XSLT Transformation Step which merges this new Document with the
> DocBook Document. For every imagedata Element, get the Dimension from the
> Metadata Document and add the @width and @depth Attributes. The Result of
> this Step is a DocBook XML Document, where every imagedata Element has the
> necessary Dimension Attributes.
>
> Since i have experience with XSLT, Steps 1 and 4 shoult be no problem. But
> how could i manage Steps 2 and 3?
>
>
>

-- 
Conal Tuohy
http://conaltuohy.com/
@conal_tuohy
+61-466-324297
Received on Wednesday, 25 November 2020 05:49:24 UTC