Re: p:directory step

Revised slightly:

Vasil Rangelov proposes[1] an atomic step to read a directory listing
and return it as a document. Jeni and I chatted about it a bit and it
seems like a good idea. Here's my (slightly revised) proposal:

<p:declare-step type="p:directory-list">
  <p:output port="result"/>
  <p:option name="path" value="."/>
  <p:option name="recursive" value="no"/>
  <p:option name="filter"/>
</p:declare-step>

The p:directory-list step reads all of the files in the specified
directory and returns a c:folder element:

  <c:directory path="abs-path-specified">
    <c:directory path="abs-path-specified/dirname"/>
    <c:file path="abs-path-specified/filename"/>
    ...
  </c:directory>

If the "recursive" option is "yes", then you get the whole, recursive
listing:

  <c:directory path="abs-path-specified">
    <c:directory path="abs-path-specified/dirname">
      <c:file path="abs-path-specified/dirname/othername"/>
      ...
    </c:directory>
    <c:file name="abs-path-specified/filename"/>
    ...
  </c:directory>

The significant change here is that the path names are returned as
fully qualified paths. The path originally specified is made absolute
before returning it.

The "filter" option specifies a command-line style pattern. So

  <p:directory-list path="." recursive="yes" filter="*.xml">

returns only the files that match "*.xml" in the current directory
or any directory under the current directory.

There are a few different ways that we could go on the whole
recursive/filter business. I suggest that filters only apply to the
names of files, not directories.

The order of c:file and c:directory elements within a directory is
implementation defined. The current working directory is
implementation defined.

I don't know exactly what to point to for the syntax for filters. We
could use regexp, but that seems like overkill (and filenames often
contain periods so it's tedious for users). I cribbed the following
text from the csh manpage (and massaged it to fit this context):

     The filter is regarded as a pattern and treats the characters
     '*', '?', and '[' specially. If a filter is specified, only files
     which have names that match the filter pattern are returned.
     For the purpose of determining whether a filename matches or not,
     only the filename part (and not any of the path components of
     its absolute name) is considered.

     In matching filenames, the character '*' matches any string of
     characters, including the null string. The character '?' matches
     any single character. The sequence [...] matches any one of the
     characters enclosed. Within [...], a pair of characters separated
     by '-' matches any character lexically between the two in Unicode
     codepoint order (inclusive). All other characters match exactly
     the same character.

Implementations can throw a dynamic error if the requested path is not
available to the user running the pipeline. The set of paths that are
available is implementation-defined. In environments where security is
paramount, there may be no accessible paths.

I propose that this be a required step.

                                        Be seeing you,
                                          norm

[1] http://lists.w3.org/Archives/Public/public-xml-processing-model-comments/2007Jul/0002.html

-- 
Norman Walsh <ndw@nwalsh.com> | Everything should be made as simple as
http://nwalsh.com/            | possible, but no simpler.

Received on Wednesday, 1 August 2007 15:52:07 UTC