- From: Tobias Reif <tobiasreif@pinkjuice.com>
- Date: Mon, 18 Aug 2003 18:33:05 +0200
- To: public-qt-comments@w3.org
Hi I need to get a sequence of the characters in a string. The current draft http://www.w3.org/TR/xquery-operators/#func-tokenize says 'fn:tokenize("abba", ".?") returns ("a", "b", "b", "a")' I don't understand why it should. After days of confusion (caused by various factors), I really would appreciate a friendly, helpful, and clear explanation. Here's what I would expect: (and what Ruby does) ruby -e "p('abba'.split(/.?/))" [] ruby -e "p('abba'.split(/./))" [] ruby -e "p('abba'.split(//))" ["a", "b", "b", "a"] Ruby's [] is an empty array, and is equivalent to XSLT2's () the empty sequence. Ruby's ["a", "b", "b", "a"] is an array of all the characters in the string (and nothing else), and is equivalent to XSLT's ("a", "b", "b", "a"). The spec says "This function breaks the $input string into a sequence of strings, treating any substring that matches $pattern as a separator." .? matches the characters, thus the separators split the string into the empty strings between the characters. Either a sequence of empty strings should be returned, or perhaps most sensible the empty sequence. The empty regex matches all zero length strings, thus the separators split the string into it's characters. A sequence containing each charecter of the string (and nothing else) should be returned. Unless I'll get an explanation convincing me that the current version of the example is correct I suggest to change 'fn:tokenize("abba", ".?") returns ("a", "b", "b", "a")' to 'fn:tokenize("abba", "") returns ("a", "b", "b", "a")' ... and perhaps add 'fn:tokenize("abba", ".?") returns ()' and 'fn:tokenize("abba", ".") returns ()' In any case I need to be able to rely on an unambiguous spec. If the example in the spec is correct from your POV, please add a clear and unambiguous explanatory specification of the behaviour. If it is incorrect, please consider changing it to 'fn:tokenize("abba", "") returns ("a", "b", "b", "a")' If you say or decide that your spec describes both examples (the one in the spec and the above) as being correct, or if you think that tokenize("abba", "") should not return the sequence of characters, then please consider adding the above example (tokenize("abba", "")) plus a clear explanatory specification of the behaviour in each case. Tobi -- http://www.pinkjuice.com/
Received on Monday, 18 August 2003 12:34:42 UTC