- From: Priscilla Walmsley <priscilla@walmsley.com>
- Date: Tue, 19 Aug 2003 08:20:11 -0400
- To: "'David Carlisle'" <davidc@nag.co.uk>, <tobiasreif@pinkjuice.com>
- Cc: <public-qt-comments@w3.org>, <ashokma@microsoft.com>
I agree that .? should use each character as a separator. But, in that case shouldn't the result be a sequence of 5 zero-length strings, not the empty sequence? That would be the behavior of tokenize() if you had any other 4 separators in a row. The Java result makes far more sense to me than what is currently written in the F&O. I think the rule should be that if the pattern IS (rather than MATCHES) the zero-length string, it splits it up into individual characters. It should never "match" a zero-length string even if it _could_. (with the possible exception of the case where the $input string is itself a zero-length string.) Priscilla > -----Original Message----- > From: public-qt-comments-request@w3.org > [mailto:public-qt-comments-request@w3.org] On Behalf Of David Carlisle > Sent: Tuesday, August 19, 2003 4:57 AM > To: tobiasreif@pinkjuice.com > Cc: public-qt-comments@w3.org; ashokma@microsoft.com > Subject: Re: [xslt2 func/op] tokenizing "abba" to ("a","b","b","a") > > > > > My reading of the F&O spec agress with Tobi's that .? should > not use the > empty string as a separator, but rather each character should be taken > as a separator, and so the result should be the empty sequence. > > The sentence > > If the supplied $pattern matches a zero length string... > > should anyway (as this thread shows) be clarified, but as it stands I > think the natural interpretation of "matches" here is the > interprestation of matches used in replace(), and in this case > that is a greedy match as .? is greedy, so since it is the > entire regexp > it is equivalent to . and will match each character. > While .? _could_ match an empty string, it doesn't here so the empty > string should not be used as separator. > > This sentence should not be changing the matching rule of the regexp, > just specifying that if the effective separator is "" that > the behaviour > is to split each character rather than error, or take the whole string > as one token. > > > David > > ______________________________________________________________ > __________ > This e-mail has been scanned for all viruses by Star Internet. The > service is powered by MessageLabs. For more information on a proactive > anti-virus service working around the clock, around the globe, visit: > http://www.star.net.uk > ______________________________________________________________ > __________ > >
Received on Tuesday, 19 August 2003 08:20:19 UTC