Re: [xslt2 func/op] tokenizing "abba" to ("a","b","b","a") from Tobias Reif on 2003-08-19 (public-qt-comments@w3.org from August 2003)

From: Tobias Reif <tobiasreif@pinkjuice.com>
Date: Tue, 19 Aug 2003 16:34:03 +0200
To: public-qt-comments@w3.org
CC: "Kay, Michael" <Michael.Kay@softwareag.com>
Message-ID: <3F42355B.7030407@pinkjuice.com>

Kay, Michael wrote:
 > You seem to be arguing for a different spec based on what ruby does.

Not really. I just want to understand the semantics F/O chose, and I 
expect them to be extremely coherent and orthogonal, and I want them to 
be useful. If they prove to be less useful than they could be, then I 
will indeed argue for different behaviour/semantics.

As I said I'm not sure about the usefulness of all those zero-length 
strings in the result sequence for example.

 > That would be a valid argument if all existing languages were
 > consistent. But they aren't.

F/O should choose a sytem that's useful and coherent, and doesn't cause 
too much headache for users (and implementers), that's all I want.

 > We have to make some kind of decision about what to do when the
 > pattern
 > matches a zero length string. Any decisions are going to be arbitrary,

Although I agree that it can very well be different from the decision 
other language designers made, I don't think it should be "arbitrary" in 
any way.

 > as the ruby and Java examples illustrate. In my view, the rule that
 > the string is split into its individual characters is a usable
 > specification and is clearly explained. We could have defined it
 > differently, but you  haven't convinced me that a different
 > specification would be better.

I don't know for sure if a different spec (behaviour/semantics) would be 
better for everyone, but I think it's worth disussing.

Returning ("a","b","b","a") for regex "" is useful and probably meets 
expectations of most coders, and is provided by the current draft AFAICS.

The other cases (more than the empty string is matched etc) are more of 
a concern; returning the empty sequence might prove more useful in 
everyday coding. (I don't argue it does, I say it might.)

The Ruby examples weren't really meant as arguments saying that F/O 
should do the same, but were provided in order to show a possible 
alternative which works for many coders.

I actually don't think it's enough to simply look at my posts with the 
Ruby examples (I didn't label them as exhaustive research or perfect 
solution) and check the Java behaviour. Instead the WG might also want 
to investigate and compare other popular regex systems such as Perl 
(including Perl 6), Python, sed, Emacs, Vim, etc etc. in order to see 
what others do, but I'm sure the WG already did that.

Tobi

-- 
http://www.pinkjuice.com/

Received on Friday, 22 August 2003 02:50:19 UTC