W3C home > Mailing lists > Public > public-qt-comments@w3.org > August 2003

RE: [xslt2 func/op] tokenizing "abba" to ("a","b","b","a")

From: Priscilla Walmsley <priscilla@walmsley.com>
Date: Tue, 19 Aug 2003 08:20:11 -0400
To: "'David Carlisle'" <davidc@nag.co.uk>, <tobiasreif@pinkjuice.com>
Cc: <public-qt-comments@w3.org>, <ashokma@microsoft.com>
Message-ID: <008301c3664c$3d392a10$ef2efea9@WALMSLEYPH>

I agree that .? should use each character as a separator.  But, in that
case shouldn't the result be a sequence of 5 zero-length strings, not
the empty sequence?  That would be the behavior of tokenize() if you had
any other 4 separators in a row.

The Java result makes far more sense to me than what is currently
written in the F&O.  I think the rule should be that if the pattern IS
(rather than MATCHES) the zero-length string, it splits it up into
individual characters.  It should never "match" a zero-length string
even if it _could_. (with the possible exception of the case where the
$input string is itself a zero-length string.)

Priscilla


> -----Original Message-----
> From: public-qt-comments-request@w3.org 
> [mailto:public-qt-comments-request@w3.org] On Behalf Of David Carlisle
> Sent: Tuesday, August 19, 2003 4:57 AM
> To: tobiasreif@pinkjuice.com
> Cc: public-qt-comments@w3.org; ashokma@microsoft.com
> Subject: Re: [xslt2 func/op] tokenizing "abba" to ("a","b","b","a")
> 
> 
> 
> 
> My reading of the F&O spec agress with Tobi's that .? should 
> not use the
> empty string as a separator, but rather each character should be taken
> as a separator, and so the result should be the empty sequence.
> 
> The sentence 
> > If the supplied $pattern matches a zero length string...
> 
> should anyway (as this thread shows) be clarified, but as it stands I
> think the natural interpretation of "matches" here is the
> interprestation of matches used in replace(), and in this case
> that is a greedy match as .? is greedy, so since it is the 
> entire regexp
> it is equivalent to . and will match each character.
> While .? _could_ match an empty string, it doesn't here so the empty
> string should not be used as separator.
> 
> This sentence should not be changing the matching rule of the regexp,
> just specifying that if the effective separator is "" that 
> the behaviour
> is to split each character rather than error, or take the whole string
> as one token.
> 
> 
> David
> 
> ______________________________________________________________
> __________
> This e-mail has been scanned for all viruses by Star Internet. The
> service is powered by MessageLabs. For more information on a proactive
> anti-virus service working around the clock, around the globe, visit:
> http://www.star.net.uk
> ______________________________________________________________
> __________
> 
> 
Received on Tuesday, 19 August 2003 08:20:19 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:45:13 UTC