Re: [expath] EXPath File Module, Update


thanks for the extensive list of comments. It may take me a while to
give feedback on all the single issues (as there was no more feedback,
apart from mhkay's thorough suggestions, we decided to finalize our
implementation and freeze the spec.).

Talking about 1/, I'd just like to point out that there are several
reasons why decided to introduce no return values to these functions.
The most important one is that these values would have to be
explicitly suppressed if they are not to be passed on to other
functions. Another one is that, depending on the compilation strategy
of the processor, it could get impossible to mix file and XQuery
Update expressions.

If we decide to specify file operations with return values, it would
(in my opinion) make more sense to introduce real file handle, and
make them mandatory for most file operations. Resulting functions
would probably look similar to the following expression:

  let $id := file:open('.....')
  return ( file:append($id, ...), file:append($id, ...), ... )

Due to the functional nature of XQuery, however, a processor could
still decide to execute the second file operation before the first.
The "sequential" function property from the XQuery Scripting Extension
may be a better choice than the "non-deterministic" keyword, but, as
far as I know, Zorba is currently the only processor that fully
supports XQSE.

Feedback is always welcome,

On Tue, Jul 24, 2012 at 11:28 PM, Florent Georges <> wrote:
> On 14 June 2012 02:09, Christian Grün wrote:
>   Hi Christian,
>> a new version of the File Module is online:
>   And as you said before, it is now officially published at:
>   I am sorry I could not respond before...  Thank you for this
> new draft!  Here are a few comments, feel free to respond here,
> or to create new issues for the points you want to keep track of
> at (with Module-File and
> Kind-Specification tags).  Unfortunately there is no issue-
> tracking system for the CGs (yet?).
>   1/ I think we already discussed that, but I could not find in
> the archive any concensus.  Given the side-effect nature of some
> functions, wouldn't it be worth making them return a value as
> well as taking it as a param, in order to be able to chain them.
> Something like a file handler or file descriptor.  That would
> have also the benefit of not having to open the file and resolve
> its path each time we want to append to a file.
>   At least, if they returned the path of the file it would then
> be possible to chain them like that:
>     append(append('out.xml', elem1), elem2)
> instead of:
>     append('out.xml', elem1), append('out.xml', elem2)
> which is very unpredictable.
>   2/ §1.2: "An implementation must accept absolute and relative
> UNIX/Linux and Windows paths".  Is it really the intent that
> every implementation MUST support Linux AND Windows paths?
>   3/ §1.2: Where is the "current working directory" defined?  Is
> it the static base URI? (or the parent dir of the static base URI
> if it designates a file)
>   4/ §1.2: "all paths must first be normalized to an
> implementation-defined representation" Why?  What is that
> representation?  Does that mean one has to always call
> file:path-to-native() before calling another function?  Then we
> could probably rather introduce file:open() returning a black-box
> item representing a file handler (see point 1/ above).
>   5/ §1.3: "Query Execution" I'd rather say "Expression
> Evaluation" in order to stick to XPath vocabulary rather than
> XQuery (then change also "query" in the last sentence to
> "expression").
>   6/ §1.4, errors: should we also create specific errors for
> permissions (right to read, write, etc.)?  Those are common
> cases, isn't it?  I find it very frustrating when a Java delete()
> fails, because it just says "success/failure", not giving any
> details in case of failure...
>   7/ instead of having [err:FILE9999] listead every time, why not
> mentioning it here then remove it from every single one function
> definition?
>   8/ §2.2: what are "volume roots" on UNIX?
>   9/ §2.2: what if $path does not exist?
>   10/ §3.2: what if the string contains a newline char?  Should
> it be translate to the platform-dependant newline char?
>   11/ §3.3: why not file:append-lines() instead of
> file:append-text-lines()?  just for the sake of brevity and
> clarity.  I think that caries the same semantics in a nicer way.
> Same for file:read-lines() and file:write-lines().
>   12/ §3.4: why taking only xs:base64Binary into account?  Why
> not xs:hexBinary? (that applies to other functions as well)
> Actually, that makes me think we should actually introduce
> xs:binary (or expath:binary), as a union type of both.  Or can we
> take the liberty to have both signotures (even if it is not
> possibel in an XSLT or XQuery function declaration, this is a
> useful specification tool):
>     file:append-binary(..., $value as xs:base64Binary) as ...
>     file:append-binary(..., $value as xs:hexBinary) as ...
>   13/ §3.5 and §3.9, second "b.": "if $target is a directory, all
> files are copied from the source into the target directory."  I
> would except a new subdirectory to be created with the same local
> name as $source.  This is more consistent with the case of a file
> (see "c." also in in §3.5).  I think that's also what `cp(1)'
> does in the UNIX shell, doesn't it?
>   14/ §3.5 and §3.9, last sentence before the errors: "no
> rollback to the original state will be possible" I would rather
> say "the state of the file store is undefined", because the
> original state might have persisted, or even an implementation
> can provide automatic rollback in case of error...
>   15/ §3.5 and §3.9: why the special case of err:FILE0003, this
> is err:FILE0001, isn't it?
>   16/ §3.6: why not file:mkdir() and file:mkdirs(), in order to
> be able to control the desired behaviour in case the parent dir
> does not exist (that is, either raise an error or create all
> parents)?  This is also the same wording as in lot of programming
> languages.
>   17/ §3.7: the default value of $recursive is false(), I guess?
> It also says "sub-directories will be deleted as well."  I guess
> that means subdirs AND FILES?  I would also begin the sentence by
> "If $path points to a directory".
>   18/ §3.7, err:FILE0004: typo s/$file/$path/.  I would also add
> "and $recursive is false()".
>   19/ §3.8: "The '.' and '..' items are never returned."  I would
> rather say: "The target dir and its parent are never returned in
> the result (e.g. '.' and '..' on UNIX-like systems)."  I am not
> sure '.' and '..' exist on all systems.
>   20/ §3.8: What the format of the returned paths?  URIs, paths,
> platform-dependent, implem-dependent?  When $recursive is true(),
> are files in subdirs returned as "a/b/c.txt"?
>   21/ §3.8: What if $recursive is true and $pattern is provided?
> Does it matches against subdirs?  E.g. "a/b/*.txt".  Or only
> against the "local name"?
>   21/ §3.8: "An implementation must support at least the
> following glob syntax"  At least?  Isn't there an
> interoperability issue here?
>   22/ §3.8: "* for matching any number of unknown characters"
> Not including the path separator, right?
>   23/ §3.11: "in its string representation" sounds a bit weird,
> I would have said "as a string", but that's maybe just me.
>   24/ §3.11 and §3.12: The default value of $encoding is UTF-8.
> I would rather say it is implementation-defined, e.g. if the
> system has more info about it (I am sure in some cases an
> implementation might know or infer the encoding).  What about the
> following?:
>     "The default value of $encoding is UTF-8, unless the
>     implementation can determine the encoding by any other
>     means."
>   25/ §3.11: What about newline chars?  Are they transformed from
> the platform-dependent newline to #x0A?
>   26/ §3.12: I guess the newline char itself is stripped out the
> end of every line?  So having 2 subsequent newlines in the file
> would result in an empty string in the result.  What if the file
> ends with a newline?  fn:unparsed-text-lines() does not return it
> (in F&O 3.0).
>   27/ §4.1: so "/" returns the empty string?  I would return "/",
> as basename does on UNIX.  For the empty string, I would rather
> throw an error than returning ".".  I expect "" to be passed
> rather because of a logic error (like a missing element used as
> the path).  More examples would be worth here...
>   28/ §4.3: how to do it the other way around?  Do we really want
> to resolve symbolic links here?  Why?
>   29/ §4.3: what if we pass a URI with the escaped char '*'?
>   30/ §4.4: soudns strange to have both path-to-native and
> path-to-uri where the former can take a URI.  I still fail to see
> the exact difference between "URI", "path" and "native".
>   31/ §4.4: how to do it the other way around?
>   32/ §4.5: how is it different from path-to-native?  What's the
> "current working directory"?
>   33/ §5.3: "&#13; on Mac systems."  Is it still the case?
>   34/ §B: Is it really the same error (namely err:FILE0003) when
> $path is not a dir, and when $path 's parent is not a dir?  See
> e.g. §3.14.
>   Regards,
> --
> Florent Georges
> --
> You received this message because you are subscribed to the Google Groups "EXPath" group.
> To post to this group, send email to
> To unsubscribe from this group, send email to
> For more options, visit this group at

Received on Saturday, 28 July 2012 16:16:32 UTC