- From: Jim Melton <jim.melton@acm.org>
- Date: Mon, 18 Aug 2003 17:59:53 -0600
- To: Priscilla Walmsley <priscilla@walmsley.com>
- CC: "'Tobias Reif'" <tobiasreif@pinkjuice.com>, public-qt-comments@w3.org, "'Jeni Tennison'" <jeni@jenitennison.com>
- Message-ID: <3F416879.8040608@acm.org>
Priscilla,
Priscilla Walmsley wrote:
>Hi,
>
>Just to pick nits,
>
Pick away! That's the only way we'll all understand things the same (and
the proper) way!
>
>
>Jim Melton wrote:
>
>
>>I disagree. We state early in the F&O specification that the
>>rules are
>>to be applied *in the order in which they are written*. If
>>you do that,
>>and read the rule in question properly (that is, without adding the
>>incorrect "...and nothing else" in your mind), then the spec is
>>unambiguous (in this respect, that is).
>>
>
>The rule about matching a zero-length string appears *after* the
>sentence:
>
>"This function breaks the $input string into a sequence of strings,
>treating any substring that matches $pattern as a separator. The
>separators themselves are not returned."
>
>Perhaps this sentence is not an official "rule", just a general
>description of the function.
>
That is my interpretation...that that first sentence was meant as a
high-level summary of what the function is supposed to do, not a "rule"
that gives the detailed semantics.
>However, the sentence is false in the case
>we are talking about. The letters "a", and "b" _do_ match the pattern
>.* , and therefore should be treated as separators according to this
>particular sentence. Maybe the sentence should start with "If $pattern
>does not match a zero-length string, ..."
>
>Or did I misunderstand what you mean by applying the rules in the order
>in which they are written?
>
I see your point. My interpretation --- that the sentence you quoted is
not a "rule", but a "summary" --- would not dictate the sort of addition
you propose. Other interpretations (that the sentence is a normative
"rule") would require such an addition/clarification.
>
>
>Anyway, back to the real issue, I think this behavior is particularly
>confusing in the case of:
>
>fn:tokenize("abba", "b?")
>
>which apparently would also return ("a", "b", "b", "a"), since the
>pattern matches a zero-length string. I think the user would expect "b"
>to be treated like a separator in this case. I know they could just use
>the pattern "b" if that's what they want, but it still seems like the
>function violates the principle of least surprise. Particularly since
>the ? in this case should be greedy.
>
Now you're talking about something substantive! I think you said that
the intended semantics are too confusing and that a pattern like "b?"
should match "b" in preference to the zero-length string. (That is, if
there is one or more instances of "b" in the input string, then match
it/them and match the zero-length string only if there are no instances
of "b".) Is that a proper interpretation of your statement?
I *personally* have no preference about this. I have a very strong
preference of not violating the principle of least astonishment, though.
But I must yield to the educated opinions of others who are familiar
with various regular expression-using languages. The issue is certainly
on the table as a result of the comment and I'm positive that the F&O
Task Force will discuss it in some depth.
Many thanks,
Jim
Received on Monday, 18 August 2003 19:56:52 UTC