- From: Alex Muir <alex.g.muir@gmail.com>
- Date: Tue, 23 Feb 2010 19:06:31 +0000
- To: James Sulak <jsulak@gmail.com>
- Cc: XProc Dev <xproc-dev@w3.org>
- Message-ID: <88b533b91002231106k717c7442na9511f684598e5f4@mail.gmail.com>
Hi, Well here are the pieces created thus far to use placeholders rather than parameters. I'll want to create a more dynamic process than what is here but will be working on other priorities because it works well enough for now. 1. xproc-Template.xpl Xproc template with placeholders between 3 tilda's ~~~endingFileNumber~~~ which might look like this in the attributes: <p:variable name="XXX" select="~~~YYY~~~"/> A problem with the approach are that the template become littered with error messages such as at the href=" attributes which are not expecting ~~~ placeholders so I'll have to work on an instance and then update the template. 2. Configuration file which includes a placeholderRegex which is used to configure the process to identify placeholders allowing for different placeholders in the xproc template attribute and uses groupings to organize ... <configuration placeholderRegex="~~~([^~]*?)~~~"> <group name="InputOutputFolders" doc="'source and output folder variables'" type="TemplateParameters"> <param name="XSLT-source-folder" value="'../../Source/2009/'"/> <param name="source-folder" value="'../../../Source/2009/'"/> <param name="output-folder" value="'../../../Output/2009/'"/> <param name="completed-folder" value="'Completed/'"/> <param name="error-folder" value="'Error/'"/> <param name="exception-folder" value="'Exception/'"/> <param name="XSLTDirectory" value="../../XSLT/"/> </group> ... </configuration> 3. ConvertXprocTemplateToExecutionFile.xsl which takes the template and configuration files as input and replaces any placeholders within the document to create a running instance. BTW our template reads in multiple files and has a starting and ending file number to dictate which files to process from the directory as follows: <p:for-each name="forEachFile"> <p:iteration-source select="//c:file[position() ge number($startingFileNumber) and position() le number($endingFileNumber)]"/> The following XSL creates a working xproc instance. <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:param name="ConfigFile" select="'../ProcessConfigurations/ConfigMaster.xml'"/> <xsl:output indent="yes" method="xml"/> <xsl:param name="startingFileNumber" select="1"/> <xsl:param name="endingFileNumber" select="2"/> <xsl:template match="*"> <xsl:copy> <xsl:apply-templates select="@* | node()"/> </xsl:copy> </xsl:template> <!-- replace any attribute value text that matches placeholderRegex --> <xsl:template match="@*"> <xsl:variable name="value" select="."/> <xsl:variable name="name" select="name(.)"/> <xsl:choose> <xsl:when test="$value = '~~~startingFileNumber~~~'"> <xsl:attribute name="{$name}" select="$startingFileNumber"/> </xsl:when> <xsl:when test="$value = '~~~endingFileNumber~~~'"> <xsl:attribute name="{$name}" select="$endingFileNumber"/> </xsl:when> <xsl:otherwise> <xsl:variable name="PlaceHolderReplacement"> <xsl:analyze-string select="." regex="{doc($ConfigFile)/configuration/@placeholderRegex}"> <xsl:matching-substring> <xsl:variable name="placeHolderName"> <xsl:value-of select="regex-group(1)"/> </xsl:variable> <xsl:value-of select="doc($ConfigFile)//param[@name = $placeHolderName]/@value"/> </xsl:matching-substring> <xsl:non-matching-substring> <xsl:value-of select="."/> </xsl:non-matching-substring> </xsl:analyze-string> </xsl:variable> <xsl:attribute name="{$name}" select="$PlaceHolderReplacement"/> </xsl:otherwise> </xsl:choose> </xsl:template> </xsl:stylesheet> 4. Main.xpl I ended up doing a bit of a strange work around for the memory consumption/leak issue regarding multiple file input and output. http://code.google.com/p/xmlcalabash/issues/detail?id=94 by creating an instance to run 10 files at a time with 256MB JVM. I'll at some point create a java app to create the list of <instance> configurations and batch files dynamically based on directory contents if the memory issue continues to be a problem but for now we just want to run a thousand files with 3 xproc instances, 10 files processed per instance at the same time simply using 3 batch files. <?xml version="1.0" encoding="UTF-8"?> <p:declare-step xmlns:p="http://www.w3.org/ns/xproc" xmlns:c=" http://www.w3.org/ns/xproc-step" xmlns:cx="http://xmlcalabash.com/ns/extensions" name="Main"> <p:documentation> Xproc Script runs the execution Process</p:documentation> <p:input port="source"> <p:inline> <main> <instance startingFileNumber="1" endingFileNumber="9" name="1"/> <instance startingFileNumber="10" endingFileNumber="19" name="2"/> <instance startingFileNumber="20" endingFileNumber="29" name="3"/> <instance startingFileNumber="30" endingFileNumber="39" name="4"/> <instance startingFileNumber="40" endingFileNumber="49" name="5"/> <instance startingFileNumber="50" endingFileNumber="59" name="6"/> <instance startingFileNumber="60" endingFileNumber="69" name="7"/> <instance startingFileNumber="70" endingFileNumber="79" name="8"/> <instance startingFileNumber="80" endingFileNumber="89" name="9"/> <instance startingFileNumber="90" endingFileNumber="100" name="10"/> <instance startingFileNumber="101" endingFileNumber="109" name="11"/> ... <instance startingFileNumber="870" endingFileNumber="879" name="88"/> <instance startingFileNumber="880" endingFileNumber="889" name="89"/> <instance startingFileNumber="890" endingFileNumber="900" name="90"/> <instance startingFileNumber="901" endingFileNumber="909" name="91"/> <instance startingFileNumber="910" endingFileNumber="919" name="92"/> <instance startingFileNumber="920" endingFileNumber="929" name="93"/> <instance startingFileNumber="930" endingFileNumber="939" name="94"/> <instance startingFileNumber="940" endingFileNumber="949" name="95"/> <instance startingFileNumber="950" endingFileNumber="959" name="96"/> <instance startingFileNumber="960" endingFileNumber="969" name="97"/> <instance startingFileNumber="970" endingFileNumber="979" name="98"/> <instance startingFileNumber="980" endingFileNumber="989" name="99"/> <instance startingFileNumber="990" endingFileNumber="1000" name="100"/> </main> </p:inline> </p:input> <p:output port="result" sequence="true"/> <p:declare-step type="cx:message"> <p:input port="source"/> <p:output port="result"/> <p:option name="message" required="true"/> </p:declare-step> <p:variable name="fileName" select="'createi4EnrichInstance'"/> <p:variable name="output-folder" select="'../Executions/'"/> <p:for-each name="forEachFile"> <p:iteration-source select="//instance"/> <p:variable name="startingFileNumber" select="instance/@startingFileNumber"/> <p:variable name="endingFileNumber" select="instance/@endingFileNumber"/> <p:variable name="name" select="instance/@name"/> <cx:message> <p:with-option name="message" select="concat('-----------------------', 'startingFileNumber', $startingFileNumber, ' endingFileNumber: ', $endingFileNumber,'-----------------------------')" /> </cx:message> <cx:message> <p:with-option name="message" select="'###### ConvertXprocTemplateToExecutionFile'"/> </cx:message> <p:load href="../ProcessTemplates/xproc-Template.xpl"/> <p:xslt name="ConvertXprocTemplateToExecutionFile"> <p:input port="source"/> <p:input port="stylesheet"> <p:document href="ConvertXprocTemplateToExecutionFile.xsl"/> </p:input> <p:with-param name="startingFileNumber" select="$startingFileNumber"/> <p:with-param name="endingFileNumber" select="$endingFileNumber"/> <p:input port="parameters"> <p:empty/> </p:input> </p:xslt> <p:identity name="out_file"/> <p:store name="store"> <p:with-option name="href" select="concat($output-folder, $fileName, $name, '.xpl' )"> <p:pipe step="out_file" port="result"/> </p:with-option> </p:store> <p:documentation> Create result XML </p:documentation> <p:identity> <p:input port="source"> <p:pipe step="store" port="result"/> </p:input> </p:identity> <cx:message> <p:with-option name="message" select="'###### Launch Process'"/> </cx:message> </p:for-each> <p:documentation>Wrap result XML </p:documentation> <p:wrap-sequence wrapper="forEachFile"/> <p:identity/> </p:declare-step> 5. Three RunProcess.bat files which run ~33 instances each or 333 files. 10 files at time. start /wait run-calabash.bat createi4EnrichInstance1.xpl start /wait run-calabash.bat createi4EnrichInstance2.xpl start /wait run-calabash.bat createi4EnrichInstance3.xpl start /wait run-calabash.bat createi4EnrichInstance4.xpl ... exit 6. BeginProcess.bat runs the whole process in this case three batch run at the same time. start /wait run-calabash.bat ..\ExecutionManager\Main.xpl start RunProcess1.bat start RunProcess2.bat start RunProcess3.bat exit Regards Alex On Fri, Feb 19, 2010 at 3:35 PM, Alex Muir <alex.g.muir@gmail.com> wrote: > Hi James, > > That's definitely interesting and useful. > > I'm a bit inspired here and in the process of creating a simple and dirty > process that has a > > - xproc template with placeholders rather than variables > - Config files with groups of name value pair parameters > - Grouping for readability/organization > - XSLT to merge template and config files > - Means of execution with multiple instances running the compiled xproc > pipelines concurrently > - xproc | batch | java app > > I'll share my findings. > > Regards > Alex > > > On Fri, Feb 19, 2010 at 2:51 PM, James Sulak <jsulak@gmail.com> wrote: > >> Hi Alex, >> >> An eval-pipeline step has been mentioned before, but as far as I know >> no one's implemented it as an extension function. >> >> I don't know if this is what you're looking for, but I've done >> something similar at runtime (not preprocessing). I had a problem >> where I needed to edit the contents of an XQuery dynamically. I >> settled on using a step that took parameters and replaced any >> instances of ${varname} with its string value. For example, I would >> construct an xquery this way: >> >> <p:identity> >> <p:input port="source"> >> <p:inline> >> <c:query xmlns="http://exist.sourceforge.net/NS/exist" >> start="1" max="20" cache="no"> >> <c:text> >> declare namespace c="http://www.w3.org/ns/xproc-step"; >> let $login := xmldb:login("xmldb:exist:///db", >> "${user}", "${password}") >> let $response := >> xmldb:create-collection("${parent-collection}", "${collection}") >> return (element c:result { concat(request:get-url(), >> $response) }) >> </c:text> >> </c:query> >> </p:inline> >> </p:input> >> </p:identity> >> >> <wxp:resolve-placeholders> >> <p:input port="parameters"> >> <p:empty /> >> </p:input> >> <p:with-param name="user" select="$user" /> >> <p:with-param name="password" select="$password" /> >> <p:with-param name="parent-collection" select="$parent-collection" /> >> <p:with-param name="collection" select="$collection" /> >> </wxp:resolve-placeholders> >> >> >> The <wxp:resolve-placeholders/> step uses <p:parameters/> to create an >> XML out of the parameters, which is then passed to a transform which >> replaces the variable names with their values. >> >> -James >> >> >> On Fri, Feb 19, 2010 at 6:37 AM, Alex Muir <alex.g.muir@gmail.com> wrote: >> > Hi, >> > >> > I was reading posts about configuration file parameters in the xproc >> list >> > archives and having my own issues using them that it led me to recall my >> > solution when creating a simple xslt pipe line as probably all on this >> list >> > have done. >> > >> > Regarding handling the configuration file: >> > >> > We started with name value pair configuration declarations in the top of >> the >> > pipe which were referenced below using xpath which became cumbersome to >> use >> > over time and at some point the idea came to use a simpler perhaps >> unrefined >> > solution that worked well. >> > We had to externalize the name value pair configuration xml file to have >> > multiple configuration files, some for end users, some for more >> technical >> > people... >> > >> > Given the need to have multiple configuration files we preprocessed to >> > combine the configuration files to pass only one config file through the >> > pipe as passing more than one would have been more work. >> > PERHAPS THE KEY POINT: Rather than reference the configuration file >> using >> > xpath and having the pipeline processor to pass the configuration file >> as a >> > DOM through the whole process to find config values dynamically as they >> were >> > needed using xpath, we replaced all the xpath with '##VariableName##' >> > referencing the same variable name from the config file as the xpath >> was. >> > Then preprocessing we complied the new pipeline xml document finding and >> > replacing '##VariableName##' with the correct value for each >> configuration >> > file as we no longer combined config files into one as there was no >> need. >> > >> > The simplification saved us development time in the future. >> > >> > From what I gather this type of script preprocessing is a fairly common >> > practice. >> > >> > >> > Questions for discussion: >> > >> > Are others doing this with their xproc scripts? Why or why not? >> > >> > I wonder would it be better that I use the parameters configuration file >> as >> > it is currently designed in xproc rather than I create a small script >> to >> > implement the ## Configuration version? >> > >> > Is it possible to have a small xproc pipe which executes this process >> and >> > then executes the regular process without running the process twice from >> the >> > command line? ( just thinking out loud here) >> > >> > Would that just require I use the "exec" step for example if I wanted to >> > launch 4 java process of the some pipe compiled with different >> > configurations? >> > I think that will work, no? >> > >> > Thanks Much >> > >> > -- >> > Alex >> > https://sites.google.com/a/utg.edu.gm/alex >> > >> > Some Good Music -- mix of western and African relaxing acoustic styles >> > http://sites.google.com/site/greigconteh/ >> > >> > > > > -- > Alex > https://sites.google.com/a/utg.edu.gm/alex > > Some Good Music > http://sites.google.com/site/greigconteh/ > -- Alex https://sites.google.com/a/utg.edu.gm/alex Some Good Music http://sites.google.com/site/greigconteh/
Received on Tuesday, 23 February 2010 19:07:00 UTC