W3C home > Mailing lists > Public > public-qt-comments@w3.org > March 2009

[Bug 6667] [FT] Stemming

From: <bugzilla@wiggum.w3.org>
Date: Fri, 13 Mar 2009 14:22:22 +0000
To: public-qt-comments@w3.org
Message-Id: <E1Li8HG-0003Xw-4t@wiggum.w3.org>

--- Comment #2 from Christian Gruen <christian.gruen@gmail.com>  2009-03-13 14:22:21 ---
Thank you Pat,

just get sure on query "ft-3.4.7-examples-q2.xq" - it performs the following

[1] 'propagating few errors' ftcontains "propagation of errors"
     with stemming with stop words ("a", "the", "of")

If the terms "propagation" and "propagating" will not be stemmed, this query
should equal the following one:

[2] 'propagating few errors' ftcontains "propagation of errors"
     with stop words ("a", "the", "of")

The text token "few" will be ignored due to stopword removal, but I would still
expect the query to yield "false". Do you agree?

In contrast, the following query should yield true:

[3] 'propagate few errors' ftcontains "propagate of errors"
     with stop words ("a", "the", "of")

In my opinion, this query should be simplified in the specification as the
stemming option does not really contribute to explain the asymmetry in the stop
word semantics; what do you think about the following version?


3.4.7 Stop Word Option


The following expression returns true, because the document contains the phrase
"propagating few errors":

/books/book[@number="1"]//p ftcontains "propagating of errors"
  with stop words ("a", "the", "of") 

Note the asymmetry in the stop word semantics: the property of being a stop
word is only relevant to query terms, not to document terms. Hence, it is
irrelevant for the above-mentioned match whether "few" is a stop word or not,
and on the other hand we do not want the query above to match "propagating"
followed by 2 stop words, or even a sequence of 3 stop words in the document.


I hope I didn't get it completely wrong - sorry for wasting your time

Christian, BaseX Team 

Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Friday, 13 March 2009 14:22:40 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:45:39 UTC