- From: David Birnbaum <djbpitt@gmail.com>
- Date: Fri, 6 Nov 2020 10:55:09 -0500
- To: XProc Dev <xproc-dev@w3.org>
- Message-ID: <CAP4v81opYzs3P+C74Q7B_+uAN85wMWP4DZdGvTw7arrt7FGVYQ@mail.gmail.com>
Dear xproc-dev, The generous suggestions and explanations provided on this list recently helped me get past a sticking point in making the value of a variable available within a step, and I write now in the hope that my current sticking point will prove similarly yielding. Here is the part that works: I am running a for-each step over a sequence of distinct values and using those values for grouping purposes: <p:for-each name="loop"> <p:with-input select="distinct-values(//paradigm)"> <p:pipe step="normalize" port="result"/> </p:with-input> <p:variable name="current-paradigm" select="string(.)"/> <p:variable name="output-filename" select="concat('../../output/verb-', $current-paradigm ! translate(., '/', '|'), '.xml')"/> <p:identity message="Processing {$current-paradigm} with output filename {$output-filename}"/> <p:wrap-sequence wrapper="words"> <p:with-input port="source" select="descendant::item[paradigm = $current-paradigm]" pipe="result@normalize"/> </p:wrap-sequence> <p:xslt name="generate"> <p:with-input port="stylesheet" href="verb-generate.xsl"/> </p:xslt> <p:store href="{encode-for-uri($output-filename)}"/> <p:sink/> </p:for-each> The "normalize" step produces a single XML output on its result port, which serves as the input into the for-each. Within the for-each, I loop over the distinct values of all of the <paradigm> elements in the input and set the variable $current-paradigm to be equal to each of them in turn, so that I can construct an output filename for each distinct paradigm value, group the <item> elements in the same source (result@normalize) according to their <paradigm> descendants, run them through an XSLT step, and save the results to disk, with one output file per paradigm. Some of the paradigm identifiers include slashes, which are illegal in filenames, so I replace them with pipe characters. I don't care about the other awkward characters (spaces, asterisks, apostrophes, etc.); I can work around those with encode-for-uri(). So far, so good. The XSLT step that is applied within the for-each step uses imports to process each paradigm type separately; the main verb-generate.xsl file imports all files in a "modules" subdirectory. I am authoring those imports one by one, and during that development phase I want to process, within the XProc for-each step, only the paradigms for which I have already authored an import. For example, there are a few hundred distinct paradigm values, and in the "modules" subdirectory I create import files like "verb-a1.xsl", "verb-1a%7%.xsl", etc. I would like to access these filenames within the XProc and use them to constrain the XSLT step to operate only for paradigms for which I have already created import modules, without applying the XSLT step to other <item> elements associated with paradigms (that is, paradigms for which I have not yet authored an import), and without creating output for them. I thought (because it works in XSLT) that the way to do this would be to create a variable equal to a sequence of strings that would be computed at run time from the filenames in the "modules" subdirectory (the contents of the "modules" subdirectory does not change during execution). I could then use that variable to tell the XProc for-each step to look at all distinct paradigm values and perform the XSLT step only for those for which an import file exists in the "modules" subdirectory (that is, the directory under the one where the XProc file lives). I have this working in XSLT as: <xsl:variable name="module-names" as="xs:string+" select="collection('modules') ! base-uri() ! tokenize(., '/')[last()] ! substring-after(., 'verb-') ! substring-before(., '.xsl')"/> <!-- stuff --> <xsl:template match="item[type/@partOfSpeech eq 'V'][encode-for-uri(paradigm ! translate(., '/', '|')) = $module-names]"> <!-- stuff --> </xsl:template I am stuck in (at least) two places when I attempt to move this logic into XProc: The first sticking point is that the following does not find the files within the "modules" subdirectory; it returns an empty <filenames/> element: <p:xquery name="madule-filenames"> <p:with-input port="query"> <p:inline content-type="application/xml"> <filenames>{{collection("modules")}}</filenames> </p:inline> </p:with-input> </p:xquery> Eventually I would use the same function mapping/chaining as in the XSLT version, above, to get the names of the files in the "modules" subdirectory and perform string surgery to whittle them down to the middle part, which will match a <paradigm> value in the input. But since I can't find the files themselves, I can't get to the point of extracting the paradigm identifiers. The second sticking point is that once I am able to get the filenames, I don't know how to put those into a variable (or refer to them directly) so that I can wrap the part of the for-each step in an if step or otherwise apply a filter that would cause it to process and create output only for paradigm values for which there are corresponding XSLT import files in the "modules" subdirectory. I understand (or ... er ... think I understand), from my earlier thread, that I would have to be attentive to where the variable is interpreted in XProc, a complication that does not apply when I do this processing within XSLT. But I would need to create the variable and make it available (or make its value available directly, without using a variable as an intermediary), and I don't see a way to do that in an XProc context. Thank you in advance for any guidance or suggestions. and if the approach I am considering, and that I have been able to implement in XSLT, is simply wrong-headed in an XProc context, "do it this way instead of the way you're trying to do it" would also be a welcome answer. Sincerely, David djbpitt@gmail.com
Received on Friday, 6 November 2020 15:55:33 UTC