- From: Alex Muir <alex.g.muir@gmail.com>
- Date: Tue, 23 Feb 2010 19:06:31 +0000
- To: James Sulak <jsulak@gmail.com>
- Cc: XProc Dev <xproc-dev@w3.org>
- Message-ID: <88b533b91002231106k717c7442na9511f684598e5f4@mail.gmail.com>
Hi,
Well here are the pieces created thus far to use placeholders rather than
parameters. I'll want to create a more dynamic process than what is here but
will be working on other priorities because it works well enough for now.
1. xproc-Template.xpl Xproc template with placeholders between 3 tilda's
~~~endingFileNumber~~~ which might look like this in the attributes:
<p:variable name="XXX" select="~~~YYY~~~"/>
A problem with the approach are that the template become littered with error
messages such as at the href=" attributes which are not expecting ~~~
placeholders so I'll have to work on an instance and then update the
template.
2. Configuration file which includes a placeholderRegex which is used to
configure the process to identify placeholders allowing for different
placeholders in the xproc template attribute and uses groupings to organize
...
<configuration placeholderRegex="~~~([^~]*?)~~~">
<group name="InputOutputFolders" doc="'source and output folder
variables'"
type="TemplateParameters">
<param name="XSLT-source-folder" value="'../../Source/2009/'"/>
<param name="source-folder" value="'../../../Source/2009/'"/>
<param name="output-folder" value="'../../../Output/2009/'"/>
<param name="completed-folder" value="'Completed/'"/>
<param name="error-folder" value="'Error/'"/>
<param name="exception-folder" value="'Exception/'"/>
<param name="XSLTDirectory" value="../../XSLT/"/>
</group>
...
</configuration>
3. ConvertXprocTemplateToExecutionFile.xsl which takes the template and
configuration files as input and replaces any placeholders within the
document to create a running instance. BTW our template reads in multiple
files and has a starting and ending file number to dictate which files to
process from the directory as follows:
<p:for-each name="forEachFile">
<p:iteration-source
select="//c:file[position() ge number($startingFileNumber) and
position() le number($endingFileNumber)]"/>
The following XSL creates a working xproc instance.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:param name="ConfigFile"
select="'../ProcessConfigurations/ConfigMaster.xml'"/>
<xsl:output indent="yes" method="xml"/>
<xsl:param name="startingFileNumber" select="1"/>
<xsl:param name="endingFileNumber" select="2"/>
<xsl:template match="*">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
<!-- replace any attribute value text that matches placeholderRegex -->
<xsl:template match="@*">
<xsl:variable name="value" select="."/>
<xsl:variable name="name" select="name(.)"/>
<xsl:choose>
<xsl:when test="$value = '~~~startingFileNumber~~~'">
<xsl:attribute name="{$name}" select="$startingFileNumber"/>
</xsl:when>
<xsl:when test="$value = '~~~endingFileNumber~~~'">
<xsl:attribute name="{$name}" select="$endingFileNumber"/>
</xsl:when>
<xsl:otherwise>
<xsl:variable name="PlaceHolderReplacement">
<xsl:analyze-string select="."
regex="{doc($ConfigFile)/configuration/@placeholderRegex}">
<xsl:matching-substring>
<xsl:variable name="placeHolderName">
<xsl:value-of select="regex-group(1)"/>
</xsl:variable>
<xsl:value-of
select="doc($ConfigFile)//param[@name =
$placeHolderName]/@value"/>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:variable>
<xsl:attribute name="{$name}"
select="$PlaceHolderReplacement"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
4. Main.xpl
I ended up doing a bit of a strange work around for the memory
consumption/leak issue regarding multiple file input and output.
http://code.google.com/p/xmlcalabash/issues/detail?id=94 by creating an
instance to run 10 files at a time with 256MB JVM. I'll at some point create
a java app to create the list of <instance> configurations and batch files
dynamically based on directory contents if the memory issue continues to be
a problem but for now we just want to run a thousand files with 3 xproc
instances, 10 files processed per instance at the same time simply using 3
batch files.
<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" xmlns:c="
http://www.w3.org/ns/xproc-step"
xmlns:cx="http://xmlcalabash.com/ns/extensions" name="Main">
<p:documentation> Xproc Script runs the execution
Process</p:documentation>
<p:input port="source">
<p:inline>
<main>
<instance startingFileNumber="1" endingFileNumber="9" name="1"/>
<instance startingFileNumber="10" endingFileNumber="19"
name="2"/>
<instance startingFileNumber="20" endingFileNumber="29"
name="3"/>
<instance startingFileNumber="30" endingFileNumber="39"
name="4"/>
<instance startingFileNumber="40" endingFileNumber="49"
name="5"/>
<instance startingFileNumber="50" endingFileNumber="59"
name="6"/>
<instance startingFileNumber="60" endingFileNumber="69"
name="7"/>
<instance startingFileNumber="70" endingFileNumber="79"
name="8"/>
<instance startingFileNumber="80" endingFileNumber="89"
name="9"/>
<instance startingFileNumber="90" endingFileNumber="100"
name="10"/>
<instance startingFileNumber="101" endingFileNumber="109"
name="11"/>
...
<instance startingFileNumber="870" endingFileNumber="879"
name="88"/>
<instance startingFileNumber="880" endingFileNumber="889"
name="89"/>
<instance startingFileNumber="890" endingFileNumber="900"
name="90"/>
<instance startingFileNumber="901" endingFileNumber="909"
name="91"/>
<instance startingFileNumber="910" endingFileNumber="919"
name="92"/>
<instance startingFileNumber="920" endingFileNumber="929"
name="93"/>
<instance startingFileNumber="930" endingFileNumber="939"
name="94"/>
<instance startingFileNumber="940" endingFileNumber="949"
name="95"/>
<instance startingFileNumber="950" endingFileNumber="959"
name="96"/>
<instance startingFileNumber="960" endingFileNumber="969"
name="97"/>
<instance startingFileNumber="970" endingFileNumber="979"
name="98"/>
<instance startingFileNumber="980" endingFileNumber="989"
name="99"/>
<instance startingFileNumber="990" endingFileNumber="1000"
name="100"/>
</main>
</p:inline>
</p:input>
<p:output port="result" sequence="true"/>
<p:declare-step type="cx:message">
<p:input port="source"/>
<p:output port="result"/>
<p:option name="message" required="true"/>
</p:declare-step>
<p:variable name="fileName" select="'createi4EnrichInstance'"/>
<p:variable name="output-folder" select="'../Executions/'"/>
<p:for-each name="forEachFile">
<p:iteration-source select="//instance"/>
<p:variable name="startingFileNumber"
select="instance/@startingFileNumber"/>
<p:variable name="endingFileNumber"
select="instance/@endingFileNumber"/>
<p:variable name="name" select="instance/@name"/>
<cx:message>
<p:with-option name="message"
select="concat('-----------------------', 'startingFileNumber',
$startingFileNumber, ' endingFileNumber: ',
$endingFileNumber,'-----------------------------')"
/>
</cx:message>
<cx:message>
<p:with-option name="message" select="'######
ConvertXprocTemplateToExecutionFile'"/>
</cx:message>
<p:load href="../ProcessTemplates/xproc-Template.xpl"/>
<p:xslt name="ConvertXprocTemplateToExecutionFile">
<p:input port="source"/>
<p:input port="stylesheet">
<p:document href="ConvertXprocTemplateToExecutionFile.xsl"/>
</p:input>
<p:with-param name="startingFileNumber"
select="$startingFileNumber"/>
<p:with-param name="endingFileNumber" select="$endingFileNumber"/>
<p:input port="parameters">
<p:empty/>
</p:input>
</p:xslt>
<p:identity name="out_file"/>
<p:store name="store">
<p:with-option name="href" select="concat($output-folder,
$fileName, $name, '.xpl' )">
<p:pipe step="out_file" port="result"/>
</p:with-option>
</p:store>
<p:documentation> Create result XML </p:documentation>
<p:identity>
<p:input port="source">
<p:pipe step="store" port="result"/>
</p:input>
</p:identity>
<cx:message>
<p:with-option name="message" select="'###### Launch Process'"/>
</cx:message>
</p:for-each>
<p:documentation>Wrap result XML </p:documentation>
<p:wrap-sequence wrapper="forEachFile"/>
<p:identity/>
</p:declare-step>
5. Three RunProcess.bat files which run ~33 instances each or 333 files. 10
files at time.
start /wait run-calabash.bat createi4EnrichInstance1.xpl
start /wait run-calabash.bat createi4EnrichInstance2.xpl
start /wait run-calabash.bat createi4EnrichInstance3.xpl
start /wait run-calabash.bat createi4EnrichInstance4.xpl
...
exit
6. BeginProcess.bat runs the whole process in this case three batch run at
the same time.
start /wait run-calabash.bat ..\ExecutionManager\Main.xpl
start RunProcess1.bat
start RunProcess2.bat
start RunProcess3.bat
exit
Regards
Alex
On Fri, Feb 19, 2010 at 3:35 PM, Alex Muir <alex.g.muir@gmail.com> wrote:
> Hi James,
>
> That's definitely interesting and useful.
>
> I'm a bit inspired here and in the process of creating a simple and dirty
> process that has a
>
> - xproc template with placeholders rather than variables
> - Config files with groups of name value pair parameters
> - Grouping for readability/organization
> - XSLT to merge template and config files
> - Means of execution with multiple instances running the compiled xproc
> pipelines concurrently
> - xproc | batch | java app
>
> I'll share my findings.
>
> Regards
> Alex
>
>
> On Fri, Feb 19, 2010 at 2:51 PM, James Sulak <jsulak@gmail.com> wrote:
>
>> Hi Alex,
>>
>> An eval-pipeline step has been mentioned before, but as far as I know
>> no one's implemented it as an extension function.
>>
>> I don't know if this is what you're looking for, but I've done
>> something similar at runtime (not preprocessing). I had a problem
>> where I needed to edit the contents of an XQuery dynamically. I
>> settled on using a step that took parameters and replaced any
>> instances of ${varname} with its string value. For example, I would
>> construct an xquery this way:
>>
>> <p:identity>
>> <p:input port="source">
>> <p:inline>
>> <c:query xmlns="http://exist.sourceforge.net/NS/exist"
>> start="1" max="20" cache="no">
>> <c:text>
>> declare namespace c="http://www.w3.org/ns/xproc-step";
>> let $login := xmldb:login("xmldb:exist:///db",
>> "${user}", "${password}")
>> let $response :=
>> xmldb:create-collection("${parent-collection}", "${collection}")
>> return (element c:result { concat(request:get-url(),
>> $response) })
>> </c:text>
>> </c:query>
>> </p:inline>
>> </p:input>
>> </p:identity>
>>
>> <wxp:resolve-placeholders>
>> <p:input port="parameters">
>> <p:empty />
>> </p:input>
>> <p:with-param name="user" select="$user" />
>> <p:with-param name="password" select="$password" />
>> <p:with-param name="parent-collection" select="$parent-collection" />
>> <p:with-param name="collection" select="$collection" />
>> </wxp:resolve-placeholders>
>>
>>
>> The <wxp:resolve-placeholders/> step uses <p:parameters/> to create an
>> XML out of the parameters, which is then passed to a transform which
>> replaces the variable names with their values.
>>
>> -James
>>
>>
>> On Fri, Feb 19, 2010 at 6:37 AM, Alex Muir <alex.g.muir@gmail.com> wrote:
>> > Hi,
>> >
>> > I was reading posts about configuration file parameters in the xproc
>> list
>> > archives and having my own issues using them that it led me to recall my
>> > solution when creating a simple xslt pipe line as probably all on this
>> list
>> > have done.
>> >
>> > Regarding handling the configuration file:
>> >
>> > We started with name value pair configuration declarations in the top of
>> the
>> > pipe which were referenced below using xpath which became cumbersome to
>> use
>> > over time and at some point the idea came to use a simpler perhaps
>> unrefined
>> > solution that worked well.
>> > We had to externalize the name value pair configuration xml file to have
>> > multiple configuration files, some for end users, some for more
>> technical
>> > people...
>> >
>> > Given the need to have multiple configuration files we preprocessed to
>> > combine the configuration files to pass only one config file through the
>> > pipe as passing more than one would have been more work.
>> > PERHAPS THE KEY POINT: Rather than reference the configuration file
>> using
>> > xpath and having the pipeline processor to pass the configuration file
>> as a
>> > DOM through the whole process to find config values dynamically as they
>> were
>> > needed using xpath, we replaced all the xpath with '##VariableName##'
>> > referencing the same variable name from the config file as the xpath
>> was.
>> > Then preprocessing we complied the new pipeline xml document finding and
>> > replacing '##VariableName##' with the correct value for each
>> configuration
>> > file as we no longer combined config files into one as there was no
>> need.
>> >
>> > The simplification saved us development time in the future.
>> >
>> > From what I gather this type of script preprocessing is a fairly common
>> > practice.
>> >
>> >
>> > Questions for discussion:
>> >
>> > Are others doing this with their xproc scripts? Why or why not?
>> >
>> > I wonder would it be better that I use the parameters configuration file
>> as
>> > it is currently designed in xproc rather than I create a small script
>> to
>> > implement the ## Configuration version?
>> >
>> > Is it possible to have a small xproc pipe which executes this process
>> and
>> > then executes the regular process without running the process twice from
>> the
>> > command line? ( just thinking out loud here)
>> >
>> > Would that just require I use the "exec" step for example if I wanted to
>> > launch 4 java process of the some pipe compiled with different
>> > configurations?
>> > I think that will work, no?
>> >
>> > Thanks Much
>> >
>> > --
>> > Alex
>> > https://sites.google.com/a/utg.edu.gm/alex
>> >
>> > Some Good Music -- mix of western and African relaxing acoustic styles
>> > http://sites.google.com/site/greigconteh/
>> >
>>
>
>
>
> --
> Alex
> https://sites.google.com/a/utg.edu.gm/alex
>
> Some Good Music
> http://sites.google.com/site/greigconteh/
>
--
Alex
https://sites.google.com/a/utg.edu.gm/alex
Some Good Music
http://sites.google.com/site/greigconteh/
Received on Tuesday, 23 February 2010 19:07:00 UTC