RE: [xslt2 func/op] tokenizing "abba" to ("a","b","b","a")

Hi,

Just to pick nits, 

Jim Melton wrote:
 
> I disagree. We state early in the F&O specification that the 
> rules are 
> to be applied *in the order in which they are written*. If 
> you do that, 
> and read the rule in question properly (that is, without adding the 
> incorrect "...and nothing else" in your mind), then the spec is 
> unambiguous (in this respect, that is). 

The rule about matching a zero-length string appears *after* the
sentence:

"This function breaks the $input string into a sequence of strings, 
treating any substring that matches $pattern as a separator. The 
separators themselves are not returned."

Perhaps this sentence is not an official "rule", just a general
description of the function.  However, the sentence is false in the case
we are talking about.  The letters "a", and "b" _do_ match the pattern
.* , and therefore should be treated as separators according to this
particular sentence.  Maybe the sentence should start with "If $pattern
does not match a zero-length string, ..." 

Or did I misunderstand what you mean by applying the rules in the order
in which they are written?

Anyway, back to the real issue, I think this behavior is particularly
confusing in the case of:

fn:tokenize("abba", "b?")

which apparently would also return ("a", "b", "b", "a"), since the
pattern matches a zero-length string. I think the user would expect "b"
to be treated like a separator in this case.  I know they could just use
the pattern "b" if that's what they want, but it still seems like the
function violates the principle of least surprise.  Particularly since
the ? in this case should be greedy.

Thanks,
Priscilla

Received on Monday, 18 August 2003 19:21:16 UTC