- From: XQuery <xquery@attbi.com>
- Date: Mon, 9 Dec 2002 00:51:27 -0500 (EST)
- To: <public-qt-comments@w3.org>
I went through the mail archives since October and removed any feedback
I had that was previously raised and answered (one item below was raised
but not answered in the archives, so I left it in).
***
*** Questions/comments on 6.4.17:
***
- What is the result of replace("xxy", "x.", "z") ? The spec says
"non-overlapping substrings", so I assume this does not result in "zz",
but does it result in "xz" or "zy" ? This should be made clear.
- What is the result of replace("xxx", "x(xx)|(x)xx", "y$1") ? Is it
"yxx" or "yx" ? Perhaps a simpler example is replace("xx", "(x)|x",
"$1"). Does it result in "", "x", or "xx"?
- So, an error is raised for replace("xxx", ".*?", "") because the
reluctant quantifier causes .* to match the "shortest possible
substring" which in this case is the empty string? If so, I think it's
worth mentioning that the reluctant quantifiers can cause patterns that
would normally succeed to error. If this was not intended, then the
definitions need reworking.
- Is an error raised only if the entire pattern matches the zero-length
string? What about captured substrings, like replace("xxx", "()x*",
"$1") or replace("xxx", (^).*($)", "$1$2")? Are these allowed
(resulting in the empty string?) or are they errors?
- If the replacement pattern is invalid, is it an error? (This is not
stated.) For example, replace("x", "(x)", "$"). What if the
replacement pattern refers to a non-existent match, such as replace("x",
"(x)", "$5") ?
- If $ must be escaped as \$, then clearly \ must also be escaped
(probably \\). Otherwise, it would be impossible to insert a backslash
followed by a captured substring. For example, replace($anything,
"(.*)", "\$1") needs to be replace($anything, "(.*)", "\\$1")
***
*** Questions/comments on 6.4.16.1:
***
- What's considered a "newline character" for the purpose of ^$.
matching? \r? \n? \r\n? (which isn't a character, but a sequence)
- The additional meta-characters change what is considered a "normal
character" in the regular expression. So in addition to modifying the
XML Schema quantifier production (4), you also want to modify the Char
production (10).
I note in passing that the XML Schema spec appendix F contains two
errors: The definition of metacharacter omits the vertical bar | (which
is properly accounted for in the Char production), while the Char
production omits the curly brace metacharacters { and } (which are
properly accounted for in the metacharacter definition). Oops.
Furthermore, the XML Schema regexp grammar allows for expressions like
"|" and "()|()". This is possibly an error. (Both branches to the
choice are allowed to be empty, because branch ::= piece*. Similarly,
parentheses can wrap the empty string.)
- Because the XML Schema grammar for regexps is flawed, and you're using
only a small part of it unmodified anyway, it's probably best to
completely define your (corrected, modified) regexp grammar here.
- "The effect of [reluctant quantifiers] is that the regular expression
matches the shortest possible substring (consistent with the match as a
whole succeeding)." I think this parenthetical statement should not be
parenthetical, because it significantly affects the behavior of the
reluctant quantifier.
***
*** Questions/comments on 6.4.19
***
- I suppose you know that the escaping rules differ for each URI part?
Section 2.4.2 of RFC 2396 might be illuminating. I'm not sure
escape-uri() is useful as-is. Should probably be something more along
the lines of construct-uri(part1 string?, part2 string?, ...).
- Consider adding functions for XML entitization/de-entitization
(suggested: entitize(string?) string? and unentitize(string?) string?).
I suppose I can cobble together the same functionality by going through
a dummy element constructor, but I think this functionality is more
central to XQuery than URI escaping.
Received on Monday, 9 December 2002 10:53:46 UTC