- From: Geert Bormans <geert@gbormans.telenet.be>
- Date: Sat, 31 Oct 2020 20:20:22 +0100 (CET)
- To: XProc Dev <xproc-dev@w3.org>
- Cc: David Birnbaum <djbpitt@gmail.com>
- Message-ID: <1754271945.15526494.1604172022280.JavaMail.zimbra@gbormans.telenet.be>
Hi David, Have you considered doing a... p:for-each on the distinct values of the item/ @paradigm in the source XML have a p:xslt inside the p:for-each that takes the paradigm as a filter parameter (so don't group but filter) and p:store the result inside the for-each Met vriendelijke groeten, Best regards, Geert Bormans ----- Op 31 okt 2020 om 20:02 schreef David Birnbaum <djbpitt@gmail.com>: Dear xproc-dev, I would be grateful for advice about how best to manage a pipeline that requires me to generate and then continue to process multiple output documents from a single input. The input contains 110k <item> elements that are distinguished by a @paradigm attribute on the <item> element; there are about 150 different @paradigm values in the input. I would like to group the <item> elements by their @paradigm values, process each group, and write the outputs for each group separately to disk. I would also like to run another transformation over those outputs and write the results of that transformation to disk, as well. I have poked at the following approaches and run into trouble with both of them, probably because (or, at least, partially because) I am not (yet, I hope!) very adept at XProc: 1. Within the XProc, I run an XSLT step that uses <xsl:for-each-group> and <xsl:result-document> to create separate output for each group, with constructed output @href values. This errors out with: <c:errors xmlns:c=" [ http://www.w3.org/ns/xproc-step | http://www.w3.org/ns/xproc-step ] "><c:error code="err:XC0121" name="generate" type="p:xslt" href="file:///Users/djb/repos/cz/pos/verb/verb.xpl" line="64" column="27" xmlns:p=" [ http://www.w3.org/ns/xproc | http://www.w3.org/ns/xproc ] " xmlns:err=" [ http://www.w3.org/ns/xproc-error | http://www.w3.org/ns/xproc-error ] "><message>URI '/Users/djb/repos/cz/output/verb-1a.xml' of secondary result is not valid or not absolute.</message></c:error></c:errors> I had first tried a relative path for the @href on the <xsl:result-document>, and I thought the error message meant that there was no base URI within the pipeline, so I specified an absolute path instead, but, as seen above, that raises the same error. I did specify a secondary port in the XProc with: <p:output port="secondary" sequence="true"/> but that seems to have no effect on the outcome (perhaps I specified it in the wrong place?). I think I should be able to write multiple result documents, and that I have misunderstood something about how to set that up. For what it's worth, I also think I may need a <p:store> step to save the multiple result documents, and although I've used <p:store> successfully with single outputs, I don't know what it should look like to save a set of result documents. But if I've understood the error correctly, I'm stalled on the XSLT step, and need to get past that first. 2. As an alternative to <xsl:for-each-group> inside the XSLT stylesheet, I considered doing the grouping in XProc, but I don't see anything within XProc comparable to <xsl:for-each-group>. If I am reading the description correctly, a <p:for-each> step might let me loop over <item> elements, but it does not appear to have the ability to form the <item> elements into groups according to shared @paradigm values and loop over those groups. I could run an XSLT pre-processing step to do the grouping, all within our document, creating an intermediate hierarchical level (called, say, <group>) and then use <p:for-each> to loop over those, but that extra step feels to me like a hack, that is, as if there should be a more direct way to do what I need. Should I ignore that feeling? Assuming I can get the individual result documents written to disk, I think I can do the subsequent transformation with a <p:for-each> step. I am using MorganaXProc-IIIse 0.9.4.2-beta and Saxon EE 10.0, and running from the command line under MacOS 10.15.7. Thanks in advance for any pointers in The Right Direction. Best, David [ mailto:djbpitt@gmail.com | djbpitt@gmail.com ]
Received on Saturday, 31 October 2020 19:20:38 UTC