W3C home > Mailing lists > Public > public-qt-comments@w3.org > September 2003

RE: [xslt2 func/op] tokenizing "abba" to ("a","b","b","a")

From: Kay, Michael <Michael.Kay@softwareag.com>
Date: Tue, 23 Sep 2003 11:54:58 +0200
Message-ID: <DFF2AC9E3583D511A21F0008C7E62106073DD140@daemsg02.software-ag.de>
To: Tobias Reif <tobiasreif@pinkjuice.com>, public-qt-comments@w3.org
Cc: Ashok Malhotra <ashokma@microsoft.com>, Jeni Tennison <jeni@jenitennison.com>
> Ashok Malhotra wrote:
> > The WGs discussed this issue in the meeting on 9/16/2003.  
> We agreed 
> > that the description of fn:tokenize was ambiguous and decided to 
> > clarify it
> 
> Great, thanks.
> 
> > by making it an error for the pattern to match the 
> zero-length string.
> 
> I'm not sure if I understand you correctly. In order to split 
> a string 
> to it's characters the pattern that specifies the separator 
> must match 
> the zero-length string (those that are inside the word), no?
> 
> My suggestion was to add
> 
>   fn:tokenize("abba", "") returns ("a", "b", "b", "a")
> 
> ... or would that return ("", "a", "b", "b", "a", "") ?
> 

We decided that fn:tokenize("abba", "") should be an error; more
specifically, fn:tokenize($in, $regex) is an error if fn:matches("", $regex)
is true.

This means we are removing the functionality for fn:tokenize to split a
string into its individual characters. There are other ways of doing this.
We looked at the specs (and actual behavior) for Perl and Java, with
different settings of the "limit" parameter, and decided that choosing any
one of the available behaviors was likely to be confusing to a significant
number of our users. Making it an error keeps our options open for the
future, whereas if we get it wrong we are stuck with it for ever.

Michael Kay
Received on Tuesday, 23 September 2003 05:55:50 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:45:14 UTC