Re: [expath] EXPath File Module, Update

On 14 June 2012 02:09, Christian Grün wrote:

  Hi Christian,

> a new version of the File Module is online:
>    http://files.basex.org/modules/expath/file/file-120614.html

  And as you said before, it is now officially published at:

    http://expath.org/spec/file

  I am sorry I could not respond before...  Thank you for this
new draft!  Here are a few comments, feel free to respond here,
or to create new issues for the points you want to keep track of
at http://code.google.com/p/expath/issues (with Module-File and
Kind-Specification tags).  Unfortunately there is no issue-
tracking system for the CGs (yet?).

  1/ I think we already discussed that, but I could not find in
the archive any concensus.  Given the side-effect nature of some
functions, wouldn't it be worth making them return a value as
well as taking it as a param, in order to be able to chain them.
Something like a file handler or file descriptor.  That would
have also the benefit of not having to open the file and resolve
its path each time we want to append to a file.

  At least, if they returned the path of the file it would then
be possible to chain them like that:

    append(append('out.xml', elem1), elem2)

instead of:

    append('out.xml', elem1), append('out.xml', elem2)

which is very unpredictable.

  2/ §1.2: "An implementation must accept absolute and relative
UNIX/Linux and Windows paths".  Is it really the intent that
every implementation MUST support Linux AND Windows paths?

  3/ §1.2: Where is the "current working directory" defined?  Is
it the static base URI? (or the parent dir of the static base URI
if it designates a file)

  4/ §1.2: "all paths must first be normalized to an
implementation-defined representation" Why?  What is that
representation?  Does that mean one has to always call
file:path-to-native() before calling another function?  Then we
could probably rather introduce file:open() returning a black-box
item representing a file handler (see point 1/ above).

  5/ §1.3: "Query Execution" I'd rather say "Expression
Evaluation" in order to stick to XPath vocabulary rather than
XQuery (then change also "query" in the last sentence to
"expression").

  6/ §1.4, errors: should we also create specific errors for
permissions (right to read, write, etc.)?  Those are common
cases, isn't it?  I find it very frustrating when a Java delete()
fails, because it just says "success/failure", not giving any
details in case of failure...

  7/ instead of having [err:FILE9999] listead every time, why not
mentioning it here then remove it from every single one function
definition?

  8/ §2.2: what are "volume roots" on UNIX?

  9/ §2.2: what if $path does not exist?

  10/ §3.2: what if the string contains a newline char?  Should
it be translate to the platform-dependant newline char?

  11/ §3.3: why not file:append-lines() instead of
file:append-text-lines()?  just for the sake of brevity and
clarity.  I think that caries the same semantics in a nicer way.
Same for file:read-lines() and file:write-lines().

  12/ §3.4: why taking only xs:base64Binary into account?  Why
not xs:hexBinary? (that applies to other functions as well)
Actually, that makes me think we should actually introduce
xs:binary (or expath:binary), as a union type of both.  Or can we
take the liberty to have both signotures (even if it is not
possibel in an XSLT or XQuery function declaration, this is a
useful specification tool):

    file:append-binary(..., $value as xs:base64Binary) as ...
    file:append-binary(..., $value as xs:hexBinary) as ...

  13/ §3.5 and §3.9, second "b.": "if $target is a directory, all
files are copied from the source into the target directory."  I
would except a new subdirectory to be created with the same local
name as $source.  This is more consistent with the case of a file
(see "c." also in in §3.5).  I think that's also what `cp(1)'
does in the UNIX shell, doesn't it?

  14/ §3.5 and §3.9, last sentence before the errors: "no
rollback to the original state will be possible" I would rather
say "the state of the file store is undefined", because the
original state might have persisted, or even an implementation
can provide automatic rollback in case of error...

  15/ §3.5 and §3.9: why the special case of err:FILE0003, this
is err:FILE0001, isn't it?

  16/ §3.6: why not file:mkdir() and file:mkdirs(), in order to
be able to control the desired behaviour in case the parent dir
does not exist (that is, either raise an error or create all
parents)?  This is also the same wording as in lot of programming
languages.

  17/ §3.7: the default value of $recursive is false(), I guess?
It also says "sub-directories will be deleted as well."  I guess
that means subdirs AND FILES?  I would also begin the sentence by
"If $path points to a directory".

  18/ §3.7, err:FILE0004: typo s/$file/$path/.  I would also add
"and $recursive is false()".

  19/ §3.8: "The '.' and '..' items are never returned."  I would
rather say: "The target dir and its parent are never returned in
the result (e.g. '.' and '..' on UNIX-like systems)."  I am not
sure '.' and '..' exist on all systems.

  20/ §3.8: What the format of the returned paths?  URIs, paths,
platform-dependent, implem-dependent?  When $recursive is true(),
are files in subdirs returned as "a/b/c.txt"?

  21/ §3.8: What if $recursive is true and $pattern is provided?
Does it matches against subdirs?  E.g. "a/b/*.txt".  Or only
against the "local name"?

  21/ §3.8: "An implementation must support at least the
following glob syntax"  At least?  Isn't there an
interoperability issue here?

  22/ §3.8: "* for matching any number of unknown characters"
Not including the path separator, right?

  23/ §3.11: "in its string representation" sounds a bit weird,
I would have said "as a string", but that's maybe just me.

  24/ §3.11 and §3.12: The default value of $encoding is UTF-8.
I would rather say it is implementation-defined, e.g. if the
system has more info about it (I am sure in some cases an
implementation might know or infer the encoding).  What about the
following?:

    "The default value of $encoding is UTF-8, unless the
    implementation can determine the encoding by any other
    means."

  25/ §3.11: What about newline chars?  Are they transformed from
the platform-dependent newline to #x0A?

  26/ §3.12: I guess the newline char itself is stripped out the
end of every line?  So having 2 subsequent newlines in the file
would result in an empty string in the result.  What if the file
ends with a newline?  fn:unparsed-text-lines() does not return it
(in F&O 3.0).

  27/ §4.1: so "/" returns the empty string?  I would return "/",
as basename does on UNIX.  For the empty string, I would rather
throw an error than returning ".".  I expect "" to be passed
rather because of a logic error (like a missing element used as
the path).  More examples would be worth here...

  28/ §4.3: how to do it the other way around?  Do we really want
to resolve symbolic links here?  Why?

  29/ §4.3: what if we pass a URI with the escaped char '*'?

  30/ §4.4: soudns strange to have both path-to-native and
path-to-uri where the former can take a URI.  I still fail to see
the exact difference between "URI", "path" and "native".

  31/ §4.4: how to do it the other way around?

  32/ §4.5: how is it different from path-to-native?  What's the
"current working directory"?

  33/ §5.3: "
 on Mac systems."  Is it still the case?

  34/ §B: Is it really the same error (namely err:FILE0003) when
$path is not a dir, and when $path 's parent is not a dir?  See
e.g. §3.14.

  Regards,

-- 
Florent Georges
http://fgeorges.org/
http://h2oconsulting.be/

Received on Tuesday, 24 July 2012 21:28:57 UTC