- From: Priscilla Walmsley <priscilla@walmsley.com>
- Date: Tue, 19 Aug 2003 08:20:11 -0400
- To: "'David Carlisle'" <davidc@nag.co.uk>, <tobiasreif@pinkjuice.com>
- Cc: <public-qt-comments@w3.org>, <ashokma@microsoft.com>
I agree that .? should use each character as a separator. But, in that
case shouldn't the result be a sequence of 5 zero-length strings, not
the empty sequence? That would be the behavior of tokenize() if you had
any other 4 separators in a row.
The Java result makes far more sense to me than what is currently
written in the F&O. I think the rule should be that if the pattern IS
(rather than MATCHES) the zero-length string, it splits it up into
individual characters. It should never "match" a zero-length string
even if it _could_. (with the possible exception of the case where the
$input string is itself a zero-length string.)
Priscilla
> -----Original Message-----
> From: public-qt-comments-request@w3.org
> [mailto:public-qt-comments-request@w3.org] On Behalf Of David Carlisle
> Sent: Tuesday, August 19, 2003 4:57 AM
> To: tobiasreif@pinkjuice.com
> Cc: public-qt-comments@w3.org; ashokma@microsoft.com
> Subject: Re: [xslt2 func/op] tokenizing "abba" to ("a","b","b","a")
>
>
>
>
> My reading of the F&O spec agress with Tobi's that .? should
> not use the
> empty string as a separator, but rather each character should be taken
> as a separator, and so the result should be the empty sequence.
>
> The sentence
> > If the supplied $pattern matches a zero length string...
>
> should anyway (as this thread shows) be clarified, but as it stands I
> think the natural interpretation of "matches" here is the
> interprestation of matches used in replace(), and in this case
> that is a greedy match as .? is greedy, so since it is the
> entire regexp
> it is equivalent to . and will match each character.
> While .? _could_ match an empty string, it doesn't here so the empty
> string should not be used as separator.
>
> This sentence should not be changing the matching rule of the regexp,
> just specifying that if the effective separator is "" that
> the behaviour
> is to split each character rather than error, or take the whole string
> as one token.
>
>
> David
>
> ______________________________________________________________
> __________
> This e-mail has been scanned for all viruses by Star Internet. The
> service is powered by MessageLabs. For more information on a proactive
> anti-virus service working around the clock, around the globe, visit:
> http://www.star.net.uk
> ______________________________________________________________
> __________
>
>
Received on Tuesday, 19 August 2003 08:20:19 UTC