- From: Jim Melton <jim.melton@acm.org>
- Date: Mon, 18 Aug 2003 17:59:53 -0600
- To: Priscilla Walmsley <priscilla@walmsley.com>
- CC: "'Tobias Reif'" <tobiasreif@pinkjuice.com>, public-qt-comments@w3.org, "'Jeni Tennison'" <jeni@jenitennison.com>
- Message-ID: <3F416879.8040608@acm.org>
Priscilla, Priscilla Walmsley wrote: >Hi, > >Just to pick nits, > Pick away! That's the only way we'll all understand things the same (and the proper) way! > > >Jim Melton wrote: > > >>I disagree. We state early in the F&O specification that the >>rules are >>to be applied *in the order in which they are written*. If >>you do that, >>and read the rule in question properly (that is, without adding the >>incorrect "...and nothing else" in your mind), then the spec is >>unambiguous (in this respect, that is). >> > >The rule about matching a zero-length string appears *after* the >sentence: > >"This function breaks the $input string into a sequence of strings, >treating any substring that matches $pattern as a separator. The >separators themselves are not returned." > >Perhaps this sentence is not an official "rule", just a general >description of the function. > That is my interpretation...that that first sentence was meant as a high-level summary of what the function is supposed to do, not a "rule" that gives the detailed semantics. >However, the sentence is false in the case >we are talking about. The letters "a", and "b" _do_ match the pattern >.* , and therefore should be treated as separators according to this >particular sentence. Maybe the sentence should start with "If $pattern >does not match a zero-length string, ..." > >Or did I misunderstand what you mean by applying the rules in the order >in which they are written? > I see your point. My interpretation --- that the sentence you quoted is not a "rule", but a "summary" --- would not dictate the sort of addition you propose. Other interpretations (that the sentence is a normative "rule") would require such an addition/clarification. > > >Anyway, back to the real issue, I think this behavior is particularly >confusing in the case of: > >fn:tokenize("abba", "b?") > >which apparently would also return ("a", "b", "b", "a"), since the >pattern matches a zero-length string. I think the user would expect "b" >to be treated like a separator in this case. I know they could just use >the pattern "b" if that's what they want, but it still seems like the >function violates the principle of least surprise. Particularly since >the ? in this case should be greedy. > Now you're talking about something substantive! I think you said that the intended semantics are too confusing and that a pattern like "b?" should match "b" in preference to the zero-length string. (That is, if there is one or more instances of "b" in the input string, then match it/them and match the zero-length string only if there are no instances of "b".) Is that a proper interpretation of your statement? I *personally* have no preference about this. I have a very strong preference of not violating the principle of least astonishment, though. But I must yield to the educated opinions of others who are familiar with various regular expression-using languages. The issue is certainly on the table as a result of the comment and I'm positive that the F&O Task Force will discuss it in some depth. Many thanks, Jim
Received on Monday, 18 August 2003 19:56:52 UTC