- From: XQuery <xquery@attbi.com>
- Date: Mon, 9 Dec 2002 00:51:27 -0500 (EST)
- To: <public-qt-comments@w3.org>
I went through the mail archives since October and removed any feedback I had that was previously raised and answered (one item below was raised but not answered in the archives, so I left it in). *** *** Questions/comments on 6.4.17: *** - What is the result of replace("xxy", "x.", "z") ? The spec says "non-overlapping substrings", so I assume this does not result in "zz", but does it result in "xz" or "zy" ? This should be made clear. - What is the result of replace("xxx", "x(xx)|(x)xx", "y$1") ? Is it "yxx" or "yx" ? Perhaps a simpler example is replace("xx", "(x)|x", "$1"). Does it result in "", "x", or "xx"? - So, an error is raised for replace("xxx", ".*?", "") because the reluctant quantifier causes .* to match the "shortest possible substring" which in this case is the empty string? If so, I think it's worth mentioning that the reluctant quantifiers can cause patterns that would normally succeed to error. If this was not intended, then the definitions need reworking. - Is an error raised only if the entire pattern matches the zero-length string? What about captured substrings, like replace("xxx", "()x*", "$1") or replace("xxx", (^).*($)", "$1$2")? Are these allowed (resulting in the empty string?) or are they errors? - If the replacement pattern is invalid, is it an error? (This is not stated.) For example, replace("x", "(x)", "$"). What if the replacement pattern refers to a non-existent match, such as replace("x", "(x)", "$5") ? - If $ must be escaped as \$, then clearly \ must also be escaped (probably \\). Otherwise, it would be impossible to insert a backslash followed by a captured substring. For example, replace($anything, "(.*)", "\$1") needs to be replace($anything, "(.*)", "\\$1") *** *** Questions/comments on 6.4.16.1: *** - What's considered a "newline character" for the purpose of ^$. matching? \r? \n? \r\n? (which isn't a character, but a sequence) - The additional meta-characters change what is considered a "normal character" in the regular expression. So in addition to modifying the XML Schema quantifier production (4), you also want to modify the Char production (10). I note in passing that the XML Schema spec appendix F contains two errors: The definition of metacharacter omits the vertical bar | (which is properly accounted for in the Char production), while the Char production omits the curly brace metacharacters { and } (which are properly accounted for in the metacharacter definition). Oops. Furthermore, the XML Schema regexp grammar allows for expressions like "|" and "()|()". This is possibly an error. (Both branches to the choice are allowed to be empty, because branch ::= piece*. Similarly, parentheses can wrap the empty string.) - Because the XML Schema grammar for regexps is flawed, and you're using only a small part of it unmodified anyway, it's probably best to completely define your (corrected, modified) regexp grammar here. - "The effect of [reluctant quantifiers] is that the regular expression matches the shortest possible substring (consistent with the match as a whole succeeding)." I think this parenthetical statement should not be parenthetical, because it significantly affects the behavior of the reluctant quantifier. *** *** Questions/comments on 6.4.19 *** - I suppose you know that the escaping rules differ for each URI part? Section 2.4.2 of RFC 2396 might be illuminating. I'm not sure escape-uri() is useful as-is. Should probably be something more along the lines of construct-uri(part1 string?, part2 string?, ...). - Consider adding functions for XML entitization/de-entitization (suggested: entitize(string?) string? and unentitize(string?) string?). I suppose I can cobble together the same functionality by going through a dummy element constructor, but I think this functionality is more central to XQuery than URI escaping.
Received on Monday, 9 December 2002 10:53:46 UTC