W3C home > Mailing lists > Public > public-qt-comments@w3.org > August 2003

Re: [xslt2 func/op] tokenizing "abba" to ("a","b","b","a")

From: Jim Melton <jim.melton@acm.org>
Date: Mon, 18 Aug 2003 16:28:22 -0600
Message-ID: <3F415306.2040603@acm.org>
To: Tobias Reif <tobiasreif@pinkjuice.com>
CC: public-qt-comments@w3.org, Jeni Tennison <jeni@jenitennison.com>


Tobias Reif wrote:

> Hi Jeni
> > The definition of the fn:tokenize() function says:
> >
> >  "If the supplied $pattern matches a zero-length string, the
> >   fn:tokenize() function breaks the string into its component
> >   characters. The nth character in the $input string becomes the nth
> >   string in the result sequence; each string in the result sequence
> >   has a string length of one."
> Exactly.
> > In the example above, the pattern ".?" is a pattern that matches a
> > zero-length string;
> But it matches more than a zero-length string AFAICS. It is not 
> explicitly specified what happens when the pattern matches more than 
> the zero-length string. IMHO returning an empty sequence is the only 
> consistent behaviour; anything else is hard to specify unambiguously. 

My reading of the sentence in question is that it is irrelevant whether 
or not the pattern matches anything other than a zero-length string. The 
only important aspect of the pattern in *this* *particular* rule is 
whether it matches the zero-length string. If it does, then that's the 
end of the story, and there's no reason to check whether it matches 
anything else...the fact that it matches the zero-length string clearly 
specifies the semantics: break the string into its component characters.

If, and only if, the pattern does not match the zero-length string does 
it become meaningful and necessary to find out what it does match and 
perform accordingly.

That is why the rule was not written to read "If the supplied $pattern 
matches a zero-length string and nothing else,...". While the chosen 
semantics might be different from your (apparent) favorite language, 
ruby, they are consistent with XML Schema (which uses similar wording) 
and they are the semantics chosen for this language.

> As I described it is not specific enough, leaving room for different 
> interpretations and thus differing implementations. 

I disagree. We state early in the F&O specification that the rules are 
to be applied *in the order in which they are written*. If you do that, 
and read the rule in question properly (that is, without adding the 
incorrect "...and nothing else" in your mind), then the spec is 
unambiguous (in this respect, that is). Frankly, if we start adding 
"...and nothing else..." to rules like this, the spec will become even 
more difficult to read than it already is!

Nonetheless, we appreciate comments and suggested improvements!

Hope this helps,
Received on Monday, 18 August 2003 18:25:19 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:45:13 UTC