RE: [xslt2 func/op] tokenizing "abba" to ("a","b","b","a") from Kay, Michael on 2003-09-23 (public-qt-comments@w3.org from September 2003)

From: Kay, Michael <Michael.Kay@softwareag.com>
Date: Tue, 23 Sep 2003 11:54:58 +0200
To: Tobias Reif <tobiasreif@pinkjuice.com>, public-qt-comments@w3.org
Cc: Ashok Malhotra <ashokma@microsoft.com>, Jeni Tennison <jeni@jenitennison.com>
Message-ID: <DFF2AC9E3583D511A21F0008C7E62106073DD140@daemsg02.software-ag.de>

> Ashok Malhotra wrote:
> > The WGs discussed this issue in the meeting on 9/16/2003.  
> We agreed 
> > that the description of fn:tokenize was ambiguous and decided to 
> > clarify it
> 
> Great, thanks.
> 
> > by making it an error for the pattern to match the 
> zero-length string.
> 
> I'm not sure if I understand you correctly. In order to split 
> a string 
> to it's characters the pattern that specifies the separator 
> must match 
> the zero-length string (those that are inside the word), no?
> 
> My suggestion was to add
> 
>   fn:tokenize("abba", "") returns ("a", "b", "b", "a")
> 
> ... or would that return ("", "a", "b", "b", "a", "") ?
> 

We decided that fn:tokenize("abba", "") should be an error; more
specifically, fn:tokenize($in, $regex) is an error if fn:matches("", $regex)
is true.

This means we are removing the functionality for fn:tokenize to split a
string into its individual characters. There are other ways of doing this.
We looked at the specs (and actual behavior) for Perl and Java, with
different settings of the "limit" parameter, and decided that choosing any
one of the available behaviors was likely to be confusing to a significant
number of our users. Making it an error keeps our options open for the
future, whereas if we get it wrong we are stuck with it for ever.

Michael Kay

Received on Tuesday, 23 September 2003 05:55:50 UTC