- From: David Birnbaum <djbpitt@gmail.com>
- Date: Sat, 31 Oct 2020 15:02:14 -0400
- To: xproc-dev@w3.org
- Message-ID: <CAP4v81pu80hGvrX=i7pDQKsUxd2+ycv-9xa7LuVJUOPxrtegPg@mail.gmail.com>
Dear xproc-dev, I would be grateful for advice about how best to manage a pipeline that requires me to generate and then continue to process multiple output documents from a single input. The input contains 110k <item> elements that are distinguished by a @paradigm attribute on the <item> element; there are about 150 different @paradigm values in the input. I would like to group the <item> elements by their @paradigm values, process each group, and write the outputs for each group separately to disk. I would also like to run another transformation over those outputs and write the results of that transformation to disk, as well. I have poked at the following approaches and run into trouble with both of them, probably because (or, at least, partially because) I am not (yet, I hope!) very adept at XProc: 1. Within the XProc, I run an XSLT step that uses <xsl:for-each-group> and <xsl:result-document> to create separate output for each group, with constructed output @href values. This errors out with: <c:errors xmlns:c="http://www.w3.org/ns/xproc-step"><c:error code="err:XC0121" name="generate" type="p:xslt" href="file:///Users/djb/repos/cz/pos/verb/verb.xpl" line="64" column="27" xmlns:p="http://www.w3.org/ns/xproc" xmlns:err=" http://www.w3.org/ns/xproc-error"><message>URI '/Users/djb/repos/cz/output/verb-1a.xml' of secondary result is not valid or not absolute.</message></c:error></c:errors> I had first tried a relative path for the @href on the <xsl:result-document>, and I thought the error message meant that there was no base URI within the pipeline, so I specified an absolute path instead, but, as seen above, that raises the same error. I did specify a secondary port in the XProc with: <p:output port="secondary" sequence="true"/> but that seems to have no effect on the outcome (perhaps I specified it in the wrong place?). I think I should be able to write multiple result documents, and that I have misunderstood something about how to set that up. For what it's worth, I also think I may need a <p:store> step to save the multiple result documents, and although I've used <p:store> successfully with single outputs, I don't know what it should look like to save a set of result documents. But if I've understood the error correctly, I'm stalled on the XSLT step, and need to get past that first. 2. As an alternative to <xsl:for-each-group> inside the XSLT stylesheet, I considered doing the grouping in XProc, but I don't see anything within XProc comparable to <xsl:for-each-group>. If I am reading the description correctly, a <p:for-each> step might let me loop over <item> elements, but it does not appear to have the ability to form the <item> elements into groups according to shared @paradigm values and loop over those groups. I could run an XSLT pre-processing step to do the grouping, all within our document, creating an intermediate hierarchical level (called, say, <group>) and then use <p:for-each> to loop over those, but that extra step feels to me like a hack, that is, as if there should be a more direct way to do what I need. Should I ignore that feeling? Assuming I can get the individual result documents written to disk, I think I can do the subsequent transformation with a <p:for-each> step. I am using MorganaXProc-IIIse 0.9.4.2-beta and Saxon EE 10.0, and running from the command line under MacOS 10.15.7. Thanks in advance for any pointers in The Right Direction. Best, David djbpitt@gmail.com
Received on Saturday, 31 October 2020 19:02:39 UTC