Re: result documents in an XSLT step?

Hi David, 

Have you considered doing a... 
p:for-each on the distinct values of the item/ @paradigm in the source XML 
have a p:xslt inside the p:for-each that takes the paradigm as a filter parameter (so don't group but filter) 
and p:store the result inside the for-each 

Met vriendelijke groeten, 
Best regards, 

Geert Bormans 

----- Op 31 okt 2020 om 20:02 schreef David Birnbaum <djbpitt@gmail.com>: 

Dear xproc-dev, 
I would be grateful for advice about how best to manage a pipeline that requires me to generate and then continue to process multiple output documents from a single input. The input contains 110k <item> elements that are distinguished by a @paradigm attribute on the <item> element; there are about 150 different @paradigm values in the input. I would like to group the <item> elements by their @paradigm values, process each group, and write the outputs for each group separately to disk. I would also like to run another transformation over those outputs and write the results of that transformation to disk, as well. I have poked at the following approaches and run into trouble with both of them, probably because (or, at least, partially because) I am not (yet, I hope!) very adept at XProc: 

1. Within the XProc, I run an XSLT step that uses <xsl:for-each-group> and <xsl:result-document> to create separate output for each group, with constructed output @href values. This errors out with: 

<c:errors xmlns:c=" [ http://www.w3.org/ns/xproc-step | http://www.w3.org/ns/xproc-step ] "><c:error code="err:XC0121" name="generate" type="p:xslt" href="file:///Users/djb/repos/cz/pos/verb/verb.xpl" line="64" column="27" xmlns:p=" [ http://www.w3.org/ns/xproc | http://www.w3.org/ns/xproc ] " xmlns:err=" [ http://www.w3.org/ns/xproc-error | http://www.w3.org/ns/xproc-error ] "><message>URI '/Users/djb/repos/cz/output/verb-1a.xml' of secondary result is not valid or not absolute.</message></c:error></c:errors> 

I had first tried a relative path for the @href on the <xsl:result-document>, and I thought the error message meant that there was no base URI within the pipeline, so I specified an absolute path instead, but, as seen above, that raises the same error. I did specify a secondary port in the XProc with: 

<p:output port="secondary" sequence="true"/> 

but that seems to have no effect on the outcome (perhaps I specified it in the wrong place?). I think I should be able to write multiple result documents, and that I have misunderstood something about how to set that up. For what it's worth, I also think I may need a <p:store> step to save the multiple result documents, and although I've used <p:store> successfully with single outputs, I don't know what it should look like to save a set of result documents. But if I've understood the error correctly, I'm stalled on the XSLT step, and need to get past that first. 

2. As an alternative to <xsl:for-each-group> inside the XSLT stylesheet, I considered doing the grouping in XProc, but I don't see anything within XProc comparable to <xsl:for-each-group>. If I am reading the description correctly, a <p:for-each> step might let me loop over <item> elements, but it does not appear to have the ability to form the <item> elements into groups according to shared @paradigm values and loop over those groups. I could run an XSLT pre-processing step to do the grouping, all within our document, creating an intermediate hierarchical level (called, say, <group>) and then use <p:for-each> to loop over those, but that extra step feels to me like a hack, that is, as if there should be a more direct way to do what I need. Should I ignore that feeling? 

Assuming I can get the individual result documents written to disk, I think I can do the subsequent transformation with a <p:for-each> step. 

I am using MorganaXProc-IIIse 0.9.4.2-beta and Saxon EE 10.0, and running from the command line under MacOS 10.15.7. Thanks in advance for any pointers in The Right Direction. 

Best, 

David 
[ mailto:djbpitt@gmail.com | djbpitt@gmail.com ] 

Received on Saturday, 31 October 2020 19:20:38 UTC