- From: Florent Georges <fgeorges@fgeorges.org>
- Date: Tue, 24 Jul 2012 23:28:08 +0200
- To: EXPath CG <public-expath@w3.org>
- Cc: expath@googlegroups.com
On 14 June 2012 02:09, Christian Grün wrote: Hi Christian, > a new version of the File Module is online: > http://files.basex.org/modules/expath/file/file-120614.html And as you said before, it is now officially published at: http://expath.org/spec/file I am sorry I could not respond before... Thank you for this new draft! Here are a few comments, feel free to respond here, or to create new issues for the points you want to keep track of at http://code.google.com/p/expath/issues (with Module-File and Kind-Specification tags). Unfortunately there is no issue- tracking system for the CGs (yet?). 1/ I think we already discussed that, but I could not find in the archive any concensus. Given the side-effect nature of some functions, wouldn't it be worth making them return a value as well as taking it as a param, in order to be able to chain them. Something like a file handler or file descriptor. That would have also the benefit of not having to open the file and resolve its path each time we want to append to a file. At least, if they returned the path of the file it would then be possible to chain them like that: append(append('out.xml', elem1), elem2) instead of: append('out.xml', elem1), append('out.xml', elem2) which is very unpredictable. 2/ §1.2: "An implementation must accept absolute and relative UNIX/Linux and Windows paths". Is it really the intent that every implementation MUST support Linux AND Windows paths? 3/ §1.2: Where is the "current working directory" defined? Is it the static base URI? (or the parent dir of the static base URI if it designates a file) 4/ §1.2: "all paths must first be normalized to an implementation-defined representation" Why? What is that representation? Does that mean one has to always call file:path-to-native() before calling another function? Then we could probably rather introduce file:open() returning a black-box item representing a file handler (see point 1/ above). 5/ §1.3: "Query Execution" I'd rather say "Expression Evaluation" in order to stick to XPath vocabulary rather than XQuery (then change also "query" in the last sentence to "expression"). 6/ §1.4, errors: should we also create specific errors for permissions (right to read, write, etc.)? Those are common cases, isn't it? I find it very frustrating when a Java delete() fails, because it just says "success/failure", not giving any details in case of failure... 7/ instead of having [err:FILE9999] listead every time, why not mentioning it here then remove it from every single one function definition? 8/ §2.2: what are "volume roots" on UNIX? 9/ §2.2: what if $path does not exist? 10/ §3.2: what if the string contains a newline char? Should it be translate to the platform-dependant newline char? 11/ §3.3: why not file:append-lines() instead of file:append-text-lines()? just for the sake of brevity and clarity. I think that caries the same semantics in a nicer way. Same for file:read-lines() and file:write-lines(). 12/ §3.4: why taking only xs:base64Binary into account? Why not xs:hexBinary? (that applies to other functions as well) Actually, that makes me think we should actually introduce xs:binary (or expath:binary), as a union type of both. Or can we take the liberty to have both signotures (even if it is not possibel in an XSLT or XQuery function declaration, this is a useful specification tool): file:append-binary(..., $value as xs:base64Binary) as ... file:append-binary(..., $value as xs:hexBinary) as ... 13/ §3.5 and §3.9, second "b.": "if $target is a directory, all files are copied from the source into the target directory." I would except a new subdirectory to be created with the same local name as $source. This is more consistent with the case of a file (see "c." also in in §3.5). I think that's also what `cp(1)' does in the UNIX shell, doesn't it? 14/ §3.5 and §3.9, last sentence before the errors: "no rollback to the original state will be possible" I would rather say "the state of the file store is undefined", because the original state might have persisted, or even an implementation can provide automatic rollback in case of error... 15/ §3.5 and §3.9: why the special case of err:FILE0003, this is err:FILE0001, isn't it? 16/ §3.6: why not file:mkdir() and file:mkdirs(), in order to be able to control the desired behaviour in case the parent dir does not exist (that is, either raise an error or create all parents)? This is also the same wording as in lot of programming languages. 17/ §3.7: the default value of $recursive is false(), I guess? It also says "sub-directories will be deleted as well." I guess that means subdirs AND FILES? I would also begin the sentence by "If $path points to a directory". 18/ §3.7, err:FILE0004: typo s/$file/$path/. I would also add "and $recursive is false()". 19/ §3.8: "The '.' and '..' items are never returned." I would rather say: "The target dir and its parent are never returned in the result (e.g. '.' and '..' on UNIX-like systems)." I am not sure '.' and '..' exist on all systems. 20/ §3.8: What the format of the returned paths? URIs, paths, platform-dependent, implem-dependent? When $recursive is true(), are files in subdirs returned as "a/b/c.txt"? 21/ §3.8: What if $recursive is true and $pattern is provided? Does it matches against subdirs? E.g. "a/b/*.txt". Or only against the "local name"? 21/ §3.8: "An implementation must support at least the following glob syntax" At least? Isn't there an interoperability issue here? 22/ §3.8: "* for matching any number of unknown characters" Not including the path separator, right? 23/ §3.11: "in its string representation" sounds a bit weird, I would have said "as a string", but that's maybe just me. 24/ §3.11 and §3.12: The default value of $encoding is UTF-8. I would rather say it is implementation-defined, e.g. if the system has more info about it (I am sure in some cases an implementation might know or infer the encoding). What about the following?: "The default value of $encoding is UTF-8, unless the implementation can determine the encoding by any other means." 25/ §3.11: What about newline chars? Are they transformed from the platform-dependent newline to #x0A? 26/ §3.12: I guess the newline char itself is stripped out the end of every line? So having 2 subsequent newlines in the file would result in an empty string in the result. What if the file ends with a newline? fn:unparsed-text-lines() does not return it (in F&O 3.0). 27/ §4.1: so "/" returns the empty string? I would return "/", as basename does on UNIX. For the empty string, I would rather throw an error than returning ".". I expect "" to be passed rather because of a logic error (like a missing element used as the path). More examples would be worth here... 28/ §4.3: how to do it the other way around? Do we really want to resolve symbolic links here? Why? 29/ §4.3: what if we pass a URI with the escaped char '*'? 30/ §4.4: soudns strange to have both path-to-native and path-to-uri where the former can take a URI. I still fail to see the exact difference between "URI", "path" and "native". 31/ §4.4: how to do it the other way around? 32/ §4.5: how is it different from path-to-native? What's the "current working directory"? 33/ §5.3: " on Mac systems." Is it still the case? 34/ §B: Is it really the same error (namely err:FILE0003) when $path is not a dir, and when $path 's parent is not a dir? See e.g. §3.14. Regards, -- Florent Georges http://fgeorges.org/ http://h2oconsulting.be/
Received on Tuesday, 24 July 2012 21:28:57 UTC