W3C home > Mailing lists > Public > public-qt-comments@w3.org > August 2003

RE: [xslt2 func/op] tokenizing "abba" to ("a","b","b","a")

From: Priscilla Walmsley <priscilla@walmsley.com>
Date: Mon, 18 Aug 2003 19:21:06 -0400
To: "'Jim Melton'" <jim.melton@acm.org>, "'Tobias Reif'" <tobiasreif@pinkjuice.com>
Cc: <public-qt-comments@w3.org>, "'Jeni Tennison'" <jeni@jenitennison.com>
Message-ID: <006a01c365df$66f01bb0$ef2efea9@WALMSLEYPH>


Just to pick nits, 

Jim Melton wrote:
> I disagree. We state early in the F&O specification that the 
> rules are 
> to be applied *in the order in which they are written*. If 
> you do that, 
> and read the rule in question properly (that is, without adding the 
> incorrect "...and nothing else" in your mind), then the spec is 
> unambiguous (in this respect, that is). 

The rule about matching a zero-length string appears *after* the

"This function breaks the $input string into a sequence of strings, 
treating any substring that matches $pattern as a separator. The 
separators themselves are not returned."

Perhaps this sentence is not an official "rule", just a general
description of the function.  However, the sentence is false in the case
we are talking about.  The letters "a", and "b" _do_ match the pattern
.* , and therefore should be treated as separators according to this
particular sentence.  Maybe the sentence should start with "If $pattern
does not match a zero-length string, ..." 

Or did I misunderstand what you mean by applying the rules in the order
in which they are written?

Anyway, back to the real issue, I think this behavior is particularly
confusing in the case of:

fn:tokenize("abba", "b?")

which apparently would also return ("a", "b", "b", "a"), since the
pattern matches a zero-length string. I think the user would expect "b"
to be treated like a separator in this case.  I know they could just use
the pattern "b" if that's what they want, but it still seems like the
function violates the principle of least surprise.  Particularly since
the ? in this case should be greedy.

Received on Monday, 18 August 2003 19:21:16 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:56:49 UTC