result documents in an XSLT step? from David Birnbaum on 2020-10-31 (xproc-dev@w3.org from October 2020)

From: David Birnbaum <djbpitt@gmail.com>
Date: Sat, 31 Oct 2020 15:02:14 -0400
To: xproc-dev@w3.org
Message-ID: <CAP4v81pu80hGvrX=i7pDQKsUxd2+ycv-9xa7LuVJUOPxrtegPg@mail.gmail.com>
Dear xproc-dev,

I would be grateful for advice about how best to manage a pipeline that
requires me to generate and then continue to process multiple output
documents from a single input. The input contains 110k <item> elements that
are distinguished by a @paradigm attribute on the <item> element; there are
about 150 different @paradigm values in the input. I would like to group
the <item> elements by their @paradigm values, process each group, and
write the outputs for each group separately to disk. I would also like to
run another transformation over those outputs and write the results of that
transformation to disk, as well. I have poked at the following approaches
and run into trouble with both of them, probably because (or, at least,
partially because) I am not (yet, I hope!) very adept at XProc:

1. Within the XProc, I run an XSLT step that uses <xsl:for-each-group> and
<xsl:result-document> to create separate output for each group, with
constructed output @href values. This errors out with:

<c:errors xmlns:c="http://www.w3.org/ns/xproc-step"><c:error
code="err:XC0121" name="generate" type="p:xslt"
href="file:///Users/djb/repos/cz/pos/verb/verb.xpl" line="64" column="27"
xmlns:p="http://www.w3.org/ns/xproc" xmlns:err="
http://www.w3.org/ns/xproc-error"><message>URI
'/Users/djb/repos/cz/output/verb-1a.xml' of secondary result is not valid
or not absolute.</message></c:error></c:errors>

I had first tried a relative path for the @href on the
<xsl:result-document>, and I thought the error message meant that there was
no base URI within the pipeline, so I specified an absolute path
instead, but, as seen above, that raises the same error. I did specify a
secondary port in the XProc with:

<p:output port="secondary" sequence="true"/>

but that seems to have no effect on the outcome (perhaps I specified it in
the wrong place?). I think I should be able to write multiple result
documents, and that I have misunderstood something about how to set that
up. For what it's worth, I also think I may need a <p:store> step to save
the multiple result documents, and although I've used <p:store>
successfully with single outputs, I don't know what it should look like to
save a set of result documents. But if I've understood the error correctly,
I'm stalled on the XSLT step, and need to get past that first.

2. As an alternative to <xsl:for-each-group> inside the XSLT stylesheet, I
considered doing the grouping in XProc, but I don't see anything within
XProc comparable to <xsl:for-each-group>. If I am reading the description
correctly, a <p:for-each> step might let me loop over <item> elements, but
it does not appear to have the ability to form the <item> elements into
groups according to shared @paradigm values and loop over those groups. I
could run an XSLT pre-processing step to do the grouping, all within our
document, creating an intermediate hierarchical level (called, say,
<group>) and then use <p:for-each> to loop over those, but that extra step
feels to me like a hack, that is, as if there should be a more direct way
to do what I need. Should I ignore that feeling?

Assuming I can get the individual result documents written to disk, I think
I can do the subsequent transformation with a <p:for-each> step.

I am using MorganaXProc-IIIse 0.9.4.2-beta and Saxon EE 10.0, and running
from the command line under MacOS 10.15.7. Thanks in advance for any
pointers in The Right Direction.

Best,

David
djbpitt@gmail.com
Received on Saturday, 31 October 2020 19:02:39 UTC