p:list step

Hello.

More often than not, we all work with a large set of XML files and aggregate
them into one which contains all or some of the data in all or some of all
those files. Well... at least I do :D

XProc provides a p:for-each step which is great for invoking a pipeline on a
set of documents and other steps provide great massive manipulation
facilities as well. One thing that is missing though is how exactly are
those documents found.

The pipeline author could create some sort of XML sitemap that he'll then
pass on to XProc for further manipulation, but when a file is
removed/added/moved, this sitemap must also be updated.

To solve this type of issues, I suggest a new (atomic?) step, possibly
called p:list (or p:index?) declared like so:
<p:declare-step type="p:list">
	<p:output port="result"/>
	<p:option name="base" required="yes"/>
	<p:option name="deep" value="yes"/>
</p:declare-step>

When invoked, this step will list all files in the folder specified by the
"base" option and all of its subfolders. If a file is provided, then the
folder of that file is used instead. Setting the "deep" option to "no" will
limit the indexing only to the files in the "base" folder.

The result of this step would be a c:folder (?) element containing zero or
more c:file elements or other c:folder elements. The c:folder and c:file
elements would have a "name" attribute which must contain the name of the
file/folder. The base folder (root element of the result document) should
not have such attribute (or if it does, its value should be "/").
Implementations may find useful to add additional information about
files/folders in other attributes. For example, "read-only" with a value of
yes or no, saying whether a file is read only. Whether such information is
generated could be tweaked with additional options. I'll leave the WG to
decide those type of details.

Example: 
<p:list base="foo" />

May produce
<c:folder>
	<c:folder name="bar"/>
		<c:folder name="foobar"/> 
		<c:file name="bar.xml"/>
	</c:folder>
	<c:file name="foo.xml"/>
</c:folder>

And
<p:list base="file:///C|/" deep="no"/>

May produce
<c:folder>
	<c:folder name="Documents and Settings"/>
	<c:folder name="Program Files"/>
	<c:folder name="Temp"/>
	<c:folder name="Windows"/>
	<c:file name="AUTOEXEC.BAT"/>
	<c:file name="boot.ini"/>
	<c:file name="CONFIG.SYS"/>
	<c:file name="IO.SYS"/>
	<c:file name="MSDOS.SYS"/>
	<c:file name="ntldr"/>
	<c:file name="pagefile.sys"/>
</c:folder>

The following pipeline demonstrates a sample use case. It indexes all files
on the server and passes the result as input to XSLT, which could then use
this information to perform all sorts of further manipulation:
<p:pipeline name="pipeline" xmlns:p="http://www.w3.org/2007/03/xproc">
	<p:input port="stylesheet" primary="yes"/>
	<p:output port="result" primary="yes"/>

	<p:xinclude/>

	<p:list base="/" name="contents"/>

	<p:xslt>
		<p:input port="document">
			<p:pipe step="contents" port="result"/>
		</p:input>
		<p:input port="stylesheet">
			<p:pipe step="pipeline" port="stylesheet"/>
		</p:input>
	</p:xslt>
</p:pipeline>

Regards,
Vasil Rangelov

Received on Sunday, 22 July 2007 14:01:06 UTC