- From: Priscilla Walmsley <priscilla@walmsley.com>
- Date: Mon, 18 Aug 2003 19:21:06 -0400
- To: "'Jim Melton'" <jim.melton@acm.org>, "'Tobias Reif'" <tobiasreif@pinkjuice.com>
- Cc: <public-qt-comments@w3.org>, "'Jeni Tennison'" <jeni@jenitennison.com>
Hi,
Just to pick nits,
Jim Melton wrote:
> I disagree. We state early in the F&O specification that the
> rules are
> to be applied *in the order in which they are written*. If
> you do that,
> and read the rule in question properly (that is, without adding the
> incorrect "...and nothing else" in your mind), then the spec is
> unambiguous (in this respect, that is).
The rule about matching a zero-length string appears *after* the
sentence:
"This function breaks the $input string into a sequence of strings,
treating any substring that matches $pattern as a separator. The
separators themselves are not returned."
Perhaps this sentence is not an official "rule", just a general
description of the function. However, the sentence is false in the case
we are talking about. The letters "a", and "b" _do_ match the pattern
.* , and therefore should be treated as separators according to this
particular sentence. Maybe the sentence should start with "If $pattern
does not match a zero-length string, ..."
Or did I misunderstand what you mean by applying the rules in the order
in which they are written?
Anyway, back to the real issue, I think this behavior is particularly
confusing in the case of:
fn:tokenize("abba", "b?")
which apparently would also return ("a", "b", "b", "a"), since the
pattern matches a zero-length string. I think the user would expect "b"
to be treated like a separator in this case. I know they could just use
the pattern "b" if that's what they want, but it still seems like the
function violates the principle of least surprise. Particularly since
the ? in this case should be greedy.
Thanks,
Priscilla
Received on Monday, 18 August 2003 19:21:16 UTC