- From: David Birnbaum <djbpitt@gmail.com>
- Date: Sun, 1 Nov 2020 10:13:35 -0500
- To: Geert Bormans <geert@gbormans.telenet.be>
- Cc: XProc Dev <xproc-dev@w3.org>
- Message-ID: <CAP4v81oS8dMeejwjrcHKvGOyty_vrpnWvyByLdDc2U1=MoALFw@mail.gmail.com>
Dear Geert (cc xproc-dev), I fear that I may be expecting XProc to behave like XSLT in situations where that's a faulty assumption, but when I try the following, it raises an error (details below). I have revised the input XML so that the distinct "paradigm" values are now child <paradigm> elements of <item> elements (in my earlier posting they were attributes), and I have upgraded Morgana to 0.9.4.8 and Saxon EE to 10.1 (which is the most recent version currently supported by Morgana). My attempt at <p:for-each> is: <p:for-each name="loop"> <p:with-input select="distinct-values(//paradigm)"> <p:pipe step="normalize" port="result"/> </p:with-input> <p:variable name="current-paradigm" as="xs:string" select="."/> <p:filter name="filtering" select="descendant::item[paradigm eq $current-paradigm]"> <p:with-input port="source"> <p:pipe step="normalize" port="result"/> </p:with-input> </p:filter> <p:xslt name="generate"> <p:with-input port="stylesheet" href="verb-generate.xsl"/> </p:xslt> </p:for-each> This raises an error on the filter line: "$current-paradigm is not declared or not visible in this context" (XD0016). I had thought that this would work because I expected that the for-each step would operate on each distinct value of <paradigm>, in turn; that the variable $current-paradigm would be set to that value on each pass through the loop; that the filter step would then have access to the variable value; and that the XSLT step would then operate on the result of the filter step. (At the moment I am not trying storing the result; I expected to see it on stdout.) My assumption about how to filter inside a for-each step is obviously wrong, but I don't understand why. The "normalize" step is an XSLT step that outputs the XML to be filtered and transformed. As an ancillary issue, when I change the filter to hard-code a specific paradigm, I get a Java heap error, and cranking the Java memory up to 16G (the machine has 32G) with -Xmx16G on the "java" line in Morgana.sh doesn't help; the same error is raised. The XProc modification is: <p:filter name="filtering" select="descendant::item[paradigm eq '1a']"> and the error is: [09:58:03.375] Generating verb forms Exception in Fiber "fiber-10000016" java.lang.OutOfMemoryError: Java heap space at net.sf.saxon.tree.tiny.TinyTree.<init>(TinyTree.java:193) at net.sf.saxon.tree.tiny.TinyBuilder.open(TinyBuilder.java:124) at net.sf.saxon.event.SequenceWriter.createTree(SequenceWriter.java:103) at net.sf.saxon.event.SequenceWriter.startDocument(SequenceWriter.java:55) at net.sf.saxon.event.ProxyReceiver.startDocument(ProxyReceiver.java:106) at com.xml_project.morganaxproc3.saxon10connector.Saxon10Core.treeWalk(Saxon10Core.java:297) at com.xml_project.morganaxproc3.saxon10connector.Saxon10Core.convertToSaxon(Saxon10Core.java:282) at com.xml_project.morganaxproc3.saxon10connector.Saxon10Core.convertToSaxon(Saxon10Core.java:199) at com.xml_project.morganaxproc3.saxon10connector.Saxon10Stylesheet.applyTemplates(Saxon10Stylesheet.java:213) at com.xml_project.morganaxproc3.steplibraries.standardsteps.XSLTStep$1.run(Unknown Source) at com.xml_project.morganaxproc3.steplibraries.AtomicXProcStepImplementation.perform(Unknown Source) at com.xml_project.mopl.steps.MoPLLibraryStep.run(Unknown Source) at com.xml_project.mopl.runtime.LibraryStepActor.startIt(Unknown Source) at com.xml_project.mopl.runtime.BufferingActor.checkRun(Unknown Source) at com.xml_project.mopl.runtime.BufferingActor.doRun(Unknown Source) at com.xml_project.mopl.runtime.BufferingActor.doRun(Unknown Source) at co.paralleluniverse.actors.Actor.run0(Actor.java:710) at co.paralleluniverse.actors.ActorRunner.run(ActorRunner.java:51) at co.paralleluniverse.fibers.Fiber.run(Fiber.java:1097) Saxon, using the default memory (that is, without any -Xmx option) completes the transformation without error. The XSLT is non-streaming, and I'd like to keep it that way, if possible, so that the pipeline will also run under Saxon HE. Best, David On Sat, Oct 31, 2020 at 4:30 PM David Birnbaum <djbpitt@gmail.com> wrote: > Dear Geert (cc xproc-dev), > > Thank you for this suggestion! Gerrit's advice about URI expectations and > storing result documents resolves the issues I reported, but it also feels > more direct to do the grouping (even if indirectly, by way of filtering) > inside XProc, since that puts the XSLT in charge only of transformation, > and lets XProc oversee the file management details. I will try your > filtering suggestion, as well, and report the results, probably tomorrow. > > Best, > > David > > On Sat, Oct 31, 2020 at 3:20 PM Geert Bormans <geert@gbormans.telenet.be> > wrote: > >> Hi David, >> >> Have you considered doing a... >> p:for-each on the distinct values of the item/@paradigm in the source XML >> have a p:xslt inside the p:for-each that takes the paradigm as a filter >> parameter (so don't group but filter) >> and p:store the result inside the for-each >> >> Met vriendelijke groeten, >> Best regards, >> >> Geert Bormans >> >> ----- Op 31 okt 2020 om 20:02 schreef David Birnbaum <djbpitt@gmail.com>: >> >> Dear xproc-dev, >> I would be grateful for advice about how best to manage a pipeline that >> requires me to generate and then continue to process multiple output >> documents from a single input. The input contains 110k <item> elements that >> are distinguished by a @paradigm attribute on the <item> element; there are >> about 150 different @paradigm values in the input. I would like to group >> the <item> elements by their @paradigm values, process each group, and >> write the outputs for each group separately to disk. I would also like to >> run another transformation over those outputs and write the results of that >> transformation to disk, as well. I have poked at the following approaches >> and run into trouble with both of them, probably because (or, at least, >> partially because) I am not (yet, I hope!) very adept at XProc: >> >> 1. Within the XProc, I run an XSLT step that uses <xsl:for-each-group> >> and <xsl:result-document> to create separate output for each group, with >> constructed output @href values. This errors out with: >> >> <c:errors xmlns:c="http://www.w3.org/ns/xproc-step"><c:error >> code="err:XC0121" name="generate" type="p:xslt" >> href="file:///Users/djb/repos/cz/pos/verb/verb.xpl" line="64" column="27" >> xmlns:p="http://www.w3.org/ns/xproc" xmlns:err=" >> http://www.w3.org/ns/xproc-error"><message>URI >> '/Users/djb/repos/cz/output/verb-1a.xml' of secondary result is not valid >> or not absolute.</message></c:error></c:errors> >> >> I had first tried a relative path for the @href on the >> <xsl:result-document>, and I thought the error message meant that there was >> no base URI within the pipeline, so I specified an absolute path >> instead, but, as seen above, that raises the same error. I did specify a >> secondary port in the XProc with: >> >> <p:output port="secondary" sequence="true"/> >> >> but that seems to have no effect on the outcome (perhaps I specified it >> in the wrong place?). I think I should be able to write multiple result >> documents, and that I have misunderstood something about how to set that >> up. For what it's worth, I also think I may need a <p:store> step to save >> the multiple result documents, and although I've used <p:store> >> successfully with single outputs, I don't know what it should look like to >> save a set of result documents. But if I've understood the error correctly, >> I'm stalled on the XSLT step, and need to get past that first. >> >> 2. As an alternative to <xsl:for-each-group> inside the XSLT stylesheet, >> I considered doing the grouping in XProc, but I don't see anything within >> XProc comparable to <xsl:for-each-group>. If I am reading the description >> correctly, a <p:for-each> step might let me loop over <item> elements, but >> it does not appear to have the ability to form the <item> elements into >> groups according to shared @paradigm values and loop over those groups. I >> could run an XSLT pre-processing step to do the grouping, all within our >> document, creating an intermediate hierarchical level (called, say, >> <group>) and then use <p:for-each> to loop over those, but that extra step >> feels to me like a hack, that is, as if there should be a more direct way >> to do what I need. Should I ignore that feeling? >> >> Assuming I can get the individual result documents written to disk, I >> think I can do the subsequent transformation with a <p:for-each> step. >> >> I am using MorganaXProc-IIIse 0.9.4.2-beta and Saxon EE 10.0, and running >> from the command line under MacOS 10.15.7. Thanks in advance for any >> pointers in The Right Direction. >> >> Best, >> >> David >> djbpitt@gmail.com >> >>
Received on Sunday, 1 November 2020 15:14:01 UTC