[Bug 3931] [FT] Section 3.2.5: order of options

http://www.w3.org/Bugs/Public/show_bug.cgi?id=3931





------- Comment #1 from pcase@crs.loc.gov  2007-01-31 18:48 -------
In 3.2 FTMatchOptions, it says that 
FTMatchOptions are applied in the order in which they are written in the query.
So yes the queries are different. Stemming and stop words are applied in the
order written.

You stem words in the query. You replace stop words with any word. So the user
can stem the words in the query before or after the stop words are replaced.

So if we have
"to be or not to be" with stemming with stop words ("be")
--You would stem all the words in query including "be", then replace "be". You 
might end up with "being" in your query, but it doesn't matter because you will 
find any word in the position of the "be" which includes "being" anyway.

"to be or not to be" with stop words ("not") with stemming 
--You would replace "be", then stem all the words left in query. You will get
the results you get above.

Either way it would be spitting in the wind to say "with stop word x" and to
search for X in your query, but you will get the same results, possibly nasty,
results. And it appears that only when they are the same that there might have
been a difference in the results.

In 4.2.3 The evaluate function, we say:
Ordering among match options is necessary because match options are not always 
commutative. For example, synonym(stem(word)) is not always the same as
stem(synonym(word)). Naturally, match options may be reordered when they
commute, 
but this is an optimization issue and is beyond the scope of this document.

So I don't think there is any need to protect users and we have said that
implementations can vary the order to optimize, so I think the spec is clear
here, the results are satisfactory, and there is no need for change. Might I
mark this won't fix?

Received on Wednesday, 31 January 2007 18:48:34 UTC