W3C home > Mailing lists > Public > xproc-dev@w3.org > February 2010

Re: Handling Configuration File Parameters -- Historical Unrefined Approach Just for Discussion

From: Alex Muir <alex.g.muir@gmail.com>
Date: Tue, 23 Feb 2010 19:06:31 +0000
Message-ID: <88b533b91002231106k717c7442na9511f684598e5f4@mail.gmail.com>
To: James Sulak <jsulak@gmail.com>
Cc: XProc Dev <xproc-dev@w3.org>

Well here are the pieces created thus far to use placeholders rather than
parameters. I'll want to create a more dynamic process than what is here but
will be working on other priorities because it works well enough for now.

1. xproc-Template.xpl Xproc template with placeholders between 3 tilda's
~~~endingFileNumber~~~ which might look like this in the attributes:

 <p:variable name="XXX" select="~~~YYY~~~"/>

A problem with the approach are that the template become littered with error
messages such as at the href=" attributes which are not expecting ~~~
placeholders so I'll have to work on an instance and then update the

2. Configuration file which includes a placeholderRegex which is used to
configure the process to identify placeholders allowing for different
placeholders in the xproc template attribute and uses groupings to organize

<configuration placeholderRegex="~~~([^~]*?)~~~">

  <group name="InputOutputFolders" doc="'source and output folder
    <param name="XSLT-source-folder" value="'../../Source/2009/'"/>
    <param name="source-folder" value="'../../../Source/2009/'"/>
    <param name="output-folder" value="'../../../Output/2009/'"/>
    <param name="completed-folder" value="'Completed/'"/>
    <param name="error-folder" value="'Error/'"/>
    <param name="exception-folder" value="'Exception/'"/>
    <param name="XSLTDirectory" value="../../XSLT/"/>


3. ConvertXprocTemplateToExecutionFile.xsl which takes the template and
configuration files as input and replaces any placeholders within the
document to create a running instance. BTW our template reads in multiple
files and has a starting and ending file number to dictate which files to
process from the directory as follows:

<p:for-each name="forEachFile">

      select="//c:file[position() ge number($startingFileNumber) and
position() le number($endingFileNumber)]"/>

The following XSL creates a working xproc instance.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    <xsl:param name="ConfigFile"
    <xsl:output indent="yes" method="xml"/>
    <xsl:param name="startingFileNumber" select="1"/>
    <xsl:param name="endingFileNumber" select="2"/>

    <xsl:template match="*">
            <xsl:apply-templates select="@* | node()"/>

    <!-- replace any attribute value text that matches placeholderRegex -->
    <xsl:template match="@*">
        <xsl:variable name="value" select="."/>
        <xsl:variable name="name" select="name(.)"/>

            <xsl:when test="$value = '~~~startingFileNumber~~~'">
                <xsl:attribute name="{$name}" select="$startingFileNumber"/>
            <xsl:when test="$value = '~~~endingFileNumber~~~'">
                <xsl:attribute name="{$name}" select="$endingFileNumber"/>
                <xsl:variable name="PlaceHolderReplacement">
                    <xsl:analyze-string select="."

                            <xsl:variable name="placeHolderName">
                                <xsl:value-of select="regex-group(1)"/>
                                select="doc($ConfigFile)//param[@name =
                            <xsl:value-of select="."/>

                <xsl:attribute name="{$name}"


4. Main.xpl

I ended up doing a bit of a strange work around for the memory
consumption/leak issue regarding multiple file input and output.
http://code.google.com/p/xmlcalabash/issues/detail?id=94 by creating an
instance to run 10 files at a time with 256MB JVM. I'll at some point create
a java app to create the list of <instance> configurations and batch files
dynamically based on directory contents if the memory issue continues to be
a problem but for now we just want to run a thousand files with 3 xproc
instances, 10 files processed per instance at the same time simply using 3
batch files.

<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" xmlns:c="
   xmlns:cx="http://xmlcalabash.com/ns/extensions" name="Main">

   <p:documentation> Xproc Script runs the execution

   <p:input port="source">
            <instance startingFileNumber="1" endingFileNumber="9" name="1"/>
            <instance startingFileNumber="10" endingFileNumber="19"
            <instance startingFileNumber="20" endingFileNumber="29"
            <instance startingFileNumber="30" endingFileNumber="39"
            <instance startingFileNumber="40" endingFileNumber="49"
            <instance startingFileNumber="50" endingFileNumber="59"
            <instance startingFileNumber="60" endingFileNumber="69"
            <instance startingFileNumber="70" endingFileNumber="79"
            <instance startingFileNumber="80" endingFileNumber="89"
            <instance startingFileNumber="90" endingFileNumber="100"
            <instance startingFileNumber="101" endingFileNumber="109"


            <instance startingFileNumber="870" endingFileNumber="879"
            <instance startingFileNumber="880" endingFileNumber="889"
            <instance startingFileNumber="890" endingFileNumber="900"
            <instance startingFileNumber="901" endingFileNumber="909"
            <instance startingFileNumber="910" endingFileNumber="919"
            <instance startingFileNumber="920" endingFileNumber="929"
            <instance startingFileNumber="930" endingFileNumber="939"
            <instance startingFileNumber="940" endingFileNumber="949"
            <instance startingFileNumber="950" endingFileNumber="959"
            <instance startingFileNumber="960" endingFileNumber="969"
            <instance startingFileNumber="970" endingFileNumber="979"
            <instance startingFileNumber="980" endingFileNumber="989"
            <instance startingFileNumber="990" endingFileNumber="1000"

   <p:output port="result" sequence="true"/>

   <p:declare-step type="cx:message">
      <p:input port="source"/>
      <p:output port="result"/>
      <p:option name="message" required="true"/>

   <p:variable name="fileName" select="'createi4EnrichInstance'"/>
   <p:variable name="output-folder" select="'../Executions/'"/>

   <p:for-each name="forEachFile">

      <p:iteration-source select="//instance"/>

      <p:variable name="startingFileNumber"
      <p:variable name="endingFileNumber"
      <p:variable name="name" select="instance/@name"/>

         <p:with-option name="message"
            select="concat('-----------------------', 'startingFileNumber',
$startingFileNumber, '  endingFileNumber: ',

         <p:with-option name="message" select="'######
      <p:load href="../ProcessTemplates/xproc-Template.xpl"/>
      <p:xslt name="ConvertXprocTemplateToExecutionFile">
         <p:input port="source"/>

         <p:input port="stylesheet">
            <p:document href="ConvertXprocTemplateToExecutionFile.xsl"/>
         <p:with-param name="startingFileNumber"
         <p:with-param name="endingFileNumber" select="$endingFileNumber"/>
         <p:input port="parameters">

      <p:identity name="out_file"/>
      <p:store name="store">
         <p:with-option name="href" select="concat($output-folder,
$fileName, $name, '.xpl' )">
            <p:pipe step="out_file" port="result"/>
      <p:documentation> Create result XML </p:documentation>
         <p:input port="source">
            <p:pipe step="store" port="result"/>

         <p:with-option name="message" select="'######   Launch Process'"/>


   <p:documentation>Wrap result XML </p:documentation>
   <p:wrap-sequence wrapper="forEachFile"/>

5. Three RunProcess.bat files which run ~33 instances each or 333 files. 10
files at time.

start /wait run-calabash.bat createi4EnrichInstance1.xpl
start /wait run-calabash.bat createi4EnrichInstance2.xpl
start /wait run-calabash.bat createi4EnrichInstance3.xpl
start /wait run-calabash.bat createi4EnrichInstance4.xpl

6. BeginProcess.bat runs the whole process in this case three batch run at
the same time.

start /wait run-calabash.bat ..\ExecutionManager\Main.xpl
start RunProcess1.bat
start RunProcess2.bat
start RunProcess3.bat


On Fri, Feb 19, 2010 at 3:35 PM, Alex Muir <alex.g.muir@gmail.com> wrote:

> Hi James,
> That's definitely interesting and useful.
> I'm a bit inspired here and in the process of creating a simple and dirty
> process that has a
>    - xproc template with placeholders rather than variables
>    - Config files with groups of name value pair parameters
>    - Grouping for readability/organization
>       - XSLT to merge template and config files
>    - Means of execution with multiple instances running the compiled xproc
>    pipelines concurrently
>    - xproc | batch | java app
> I'll share my findings.
> Regards
> Alex
> On Fri, Feb 19, 2010 at 2:51 PM, James Sulak <jsulak@gmail.com> wrote:
>> Hi Alex,
>> An eval-pipeline step has been mentioned before, but as far as I know
>> no one's implemented it as an extension function.
>> I don't know if this is what you're looking for, but I've done
>> something similar at runtime (not preprocessing). I had a problem
>> where I needed to edit the contents of an XQuery dynamically.  I
>> settled on using a step that took parameters and replaced any
>> instances of ${varname} with its string value.  For example, I would
>> construct an xquery this way:
>>    <p:identity>
>>      <p:input port="source">
>>        <p:inline>
>>          <c:query xmlns="http://exist.sourceforge.net/NS/exist"
>> start="1" max="20" cache="no">
>>            <c:text>
>>              declare namespace c="http://www.w3.org/ns/xproc-step";
>>              let $login := xmldb:login("xmldb:exist:///db",
>> "${user}", "${password}")
>>              let $response :=
>> xmldb:create-collection("${parent-collection}", "${collection}")
>>              return (element c:result { concat(request:get-url(),
>> $response) })
>>            </c:text>
>>          </c:query>
>>        </p:inline>
>>      </p:input>
>>    </p:identity>
>>    <wxp:resolve-placeholders>
>>      <p:input port="parameters">
>>        <p:empty />
>>      </p:input>
>>      <p:with-param name="user" select="$user" />
>>      <p:with-param name="password" select="$password" />
>>      <p:with-param name="parent-collection" select="$parent-collection" />
>>      <p:with-param name="collection" select="$collection" />
>>    </wxp:resolve-placeholders>
>> The <wxp:resolve-placeholders/> step uses <p:parameters/> to create an
>> XML out of the parameters, which is then passed to a transform which
>> replaces the variable names with their values.
>> -James
>> On Fri, Feb 19, 2010 at 6:37 AM, Alex Muir <alex.g.muir@gmail.com> wrote:
>> > Hi,
>> >
>> > I was reading posts about configuration file parameters in the xproc
>> list
>> > archives and having my own issues using them that it led me to recall my
>> > solution when creating a simple xslt pipe line as probably all on this
>> list
>> > have done.
>> >
>> > Regarding handling the configuration file:
>> >
>> > We started with name value pair configuration declarations in the top of
>> the
>> > pipe which were referenced below using xpath which became cumbersome to
>> use
>> > over time and at some point the idea came to use a simpler perhaps
>> unrefined
>> > solution that worked well.
>> > We had to externalize the name value pair configuration xml file to have
>> > multiple configuration files, some for end users, some for more
>> technical
>> > people...
>> >
>> > Given the need to have multiple configuration files we preprocessed to
>> > combine the configuration files to pass only one config file through the
>> > pipe as passing more than one would have been more work.
>> > PERHAPS THE KEY POINT: Rather than reference the configuration file
>> using
>> > xpath and having the pipeline processor to pass the configuration file
>> as a
>> > DOM through the whole process to find config values dynamically as they
>> were
>> > needed using xpath, we replaced all the xpath with '##VariableName##'
>> > referencing the same variable name from the config file as the xpath
>> was.
>> > Then preprocessing we complied the new pipeline xml document finding and
>> > replacing '##VariableName##' with the correct value for each
>> configuration
>> > file as we no longer combined config files into one as there was no
>> need.
>> >
>> > The simplification saved us development time in the future.
>> >
>> > From what I gather this type of script preprocessing is a fairly common
>> > practice.
>> >
>> >
>> > Questions for discussion:
>> >
>> > Are others doing this with their xproc scripts? Why or why not?
>> >
>> > I wonder would it be better that I use the parameters configuration file
>> as
>> > it is currently  designed in xproc rather than I create a small script
>> to
>> > implement the ## Configuration version?
>> >
>> > Is it possible to have a small xproc pipe which executes this process
>> and
>> > then executes the regular process without running the process twice from
>> the
>> > command line? ( just thinking out loud here)
>> >
>> > Would that just require I use the "exec" step for example if I wanted to
>> > launch 4 java process of the some pipe compiled with different
>> > configurations?
>> > I think that will work, no?
>> >
>> > Thanks Much
>> >
>> > --
>> > Alex
>> > https://sites.google.com/a/utg.edu.gm/alex
>> >
>> > Some Good Music -- mix of western and African relaxing acoustic styles
>> > http://sites.google.com/site/greigconteh/
>> >
> --
> Alex
> https://sites.google.com/a/utg.edu.gm/alex
> Some Good Music
> http://sites.google.com/site/greigconteh/


Some Good Music
Received on Tuesday, 23 February 2010 19:07:00 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 23 February 2010 19:07:00 GMT