Creating and using variables ... again ...

Dear xproc-dev,

The generous suggestions and explanations provided on this list recently
helped me get past a sticking point in making the value of a variable
available within a step, and I write now in the hope that my current
sticking point will prove similarly yielding.

Here is the part that works: I am running a for-each step over a sequence
of distinct values and using those values for grouping purposes:

  <p:for-each name="loop">
    <p:with-input select="distinct-values(//paradigm)">
      <p:pipe step="normalize" port="result"/>
    </p:with-input>
    <p:variable name="current-paradigm" select="string(.)"/>
    <p:variable name="output-filename"
      select="concat('../../output/verb-', $current-paradigm ! translate(.,
'/', '|'), '.xml')"/>
    <p:identity message="Processing {$current-paradigm} with output
filename {$output-filename}"/>
    <p:wrap-sequence wrapper="words">
      <p:with-input port="source"
        select="descendant::item[paradigm = $current-paradigm]"
        pipe="result@normalize"/>
    </p:wrap-sequence>
    <p:xslt name="generate">
      <p:with-input port="stylesheet" href="verb-generate.xsl"/>
    </p:xslt>
    <p:store href="{encode-for-uri($output-filename)}"/>
    <p:sink/>
  </p:for-each>

The "normalize" step produces a single XML output on its result port, which
serves as the input into the for-each. Within the for-each, I loop over the
distinct values of all of the <paradigm> elements in the input and set the
variable $current-paradigm to be equal to each of them in turn, so that I
can construct an output filename for each distinct paradigm value, group
the <item> elements in the same source (result@normalize) according to
their <paradigm> descendants, run them through an XSLT step, and save the
results to disk, with one output file per paradigm. Some of the paradigm
identifiers include slashes, which are illegal in filenames, so I replace
them with pipe characters. I don't care about the other awkward characters
(spaces, asterisks, apostrophes, etc.); I can work around those with
encode-for-uri(). So far, so good.

The XSLT step that is applied within the for-each step uses imports to
process each paradigm type separately; the main verb-generate.xsl file
imports all files in a "modules" subdirectory. I am authoring those imports
one by one, and during that development phase I want to process, within the
XProc for-each step, only the paradigms for which I have already authored
an import. For example, there are a few hundred distinct paradigm values,
and in the "modules" subdirectory I create import files like "verb-a1.xsl",
"verb-1a%7%.xsl", etc. I would like to access these filenames within the
XProc and use them to constrain the XSLT step to operate only for paradigms
for which I have already created import modules, without applying the XSLT
step to other <item> elements associated with paradigms (that is, paradigms
for which I have not yet authored an import), and without creating output
for them. I thought (because it works in XSLT) that the way to do this
would be to create a variable equal to a sequence of strings that would be
computed at run time from the filenames in the "modules" subdirectory (the
contents of the "modules" subdirectory does not change during execution). I
could then use that variable to tell the XProc for-each step to look at all
distinct paradigm values and perform the XSLT step only for those for which
an import file exists in the "modules" subdirectory (that is, the directory
under the one where the XProc file lives).

I have this working in XSLT as:

  <xsl:variable name="module-names" as="xs:string+"
    select="collection('modules') ! base-uri() ! tokenize(., '/')[last()] !
substring-after(., 'verb-') ! substring-before(., '.xsl')"/>
<!-- stuff -->
<xsl:template match="item[type/@partOfSpeech eq
'V'][encode-for-uri(paradigm ! translate(., '/', '|')) = $module-names]">
    <!-- stuff -->
</xsl:template


I am stuck in (at least) two places when I attempt to move this logic into
XProc:

The first sticking point is that the following does not find the files
within the "modules" subdirectory; it returns an empty <filenames/> element:

  <p:xquery name="madule-filenames">
    <p:with-input port="query">
      <p:inline content-type="application/xml">
        <filenames>{{collection("modules")}}</filenames>
      </p:inline>
    </p:with-input>
  </p:xquery>


Eventually I would use the same function mapping/chaining as in the XSLT
version, above, to get the names of the files in the "modules" subdirectory
and perform string surgery to whittle them down to the middle part, which
will match a <paradigm> value in the input. But since I can't find the
files themselves, I can't get to the point of extracting the paradigm
identifiers.

The second sticking point is that once I am able to get the filenames, I
don't know how to put those into a variable (or refer to them directly) so
that I can wrap the part of the for-each step in an if step or otherwise
apply a filter that would cause it to process and create output only for
paradigm values for which there are corresponding XSLT import files in the
"modules" subdirectory. I understand (or ... er ... think I understand),
from my earlier thread, that I would have to be attentive to where the
variable is interpreted in XProc, a complication that does not apply when I
do this processing within XSLT. But I would need to create the variable and
make it available (or make its value available directly, without using a
variable as an intermediary), and I don't see a way to do that in an XProc
context.

Thank you in advance for any guidance or suggestions. and if the approach I
am considering, and that I have been able to implement in XSLT, is simply
wrong-headed in an XProc context, "do it this way instead of the way you're
trying to do it" would also be a welcome answer.

Sincerely,

David
djbpitt@gmail.com

Received on Friday, 6 November 2020 15:55:33 UTC