- From: Kay, Michael <Michael.Kay@softwareag.com>
- Date: Tue, 19 Aug 2003 12:25:28 +0200
- To: Tobias Reif <tobiasreif@pinkjuice.com>, public-qt-comments@w3.org
- Cc: Jeni Tennison <jeni@jenitennison.com>
- Message-ID: <DFF2AC9E3583D511A21F0008C7E62106073DD083@daemsg02.software-ag.de>
> > > The definition of the fn:tokenize() function says: > > > > "If the supplied $pattern matches a zero-length string, the > > fn:tokenize() function breaks the string into its component > > characters. The nth character in the $input string > becomes the nth > > string in the result sequence; each string in the result sequence > > has a string length of one." > > Exactly. > > > In the example above, the pattern ".?" is a pattern that > matches a > zero-length string; > > But it matches more than a zero-length string AFAICS. It is not > explicitly specified what happens when the pattern matches > more than the > zero-length string. IMHO returning an empty sequence is the only > consistent behaviour; anything else is hard to specify unambiguously. We say what happens when it matches a zero length string. This pattern matches a zero length string. I can't see any ambiguity here. You can ask why we decided to specify it this way, but I don't think you can claim that the spec is ambiguous. The sentence "If the supplied $pattern matches a zero-length string, the fn:tokenize() function breaks the string into its component characters." seems about as clear as you can get. You seem to be arguing for a different spec based on what ruby does. That would be a valid argument if all existing languages were consistent. But they aren't. The Java split() method, for example, produces the sequence ("", "a", "b", "b", "a", "") when the pattern is "" and the sequence ("", "", "", "", "", "") when the pattern is ".?" We have to make some kind of decision about what to do when the pattern matches a zero length string. Any decisions are going to be arbitrary, as the ruby and Java examples illustrate. In my view, the rule that the string is split into its individual characters is a usable specification and is clearly explained. We could have defined it differently, but you haven't convinced me that a different specification would be better. Michael Kay
Received on Tuesday, 19 August 2003 06:26:29 UTC