- From: <bugzilla@jessica.w3.org>
- Date: Thu, 17 Feb 2011 14:27:01 +0000
- To: public-qt-comments@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=12109 Summary: [FT] StopWord Option Product: XPath / XQuery / XSLT Version: Proposed Recommendation Platform: PC OS/Version: Windows NT Status: NEW Severity: normal Priority: P2 Component: Full Text 1.0 AssignedTo: jim.melton@acm.org ReportedBy: tim@cbcl.co.uk QAContact: public-qt-comments@w3.org There is very little information given regarding how stop words work except as part of phrases or in the context of FTWindow/FTDistance. In information retrieval system based upon inverted indices, it is traditional to use stop words to remove high frequency terms from the index to reduce the size of the inverted index. It is also traditional to ignore stop words during query processing to improve query performance (both speed and precision). The text: "Some implementations may apply stop word lists during indexing and be unable to comply with query-time requests to not apply those stop words." implies that XQuery Full Text is amenable to the approach of inverted indices with stop words stripped at index time. Consider the query: declase ft-option using stop words ("be", "not", "or", "to"); "to be or not to be" contains text "to" According to the specification "Stop words are tokens in the query that match any token in the text being searched" This seems to suggest that the result should be identical to "to be or not to be" contains text ".+" using wildcards Since "to be or not to be" is entirely composed of stop words, any application of stop word lists during indexing means that it contains no tokens and thus the result would be "false" rather than "true". -- Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
Received on Thursday, 17 February 2011 14:27:02 UTC