- From: <bugzilla@jessica.w3.org>
- Date: Thu, 17 Feb 2011 14:27:01 +0000
- To: public-qt-comments@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=12109
Summary: [FT] StopWord Option
Product: XPath / XQuery / XSLT
Version: Proposed Recommendation
Platform: PC
OS/Version: Windows NT
Status: NEW
Severity: normal
Priority: P2
Component: Full Text 1.0
AssignedTo: jim.melton@acm.org
ReportedBy: tim@cbcl.co.uk
QAContact: public-qt-comments@w3.org
There is very little information given regarding how stop words work except as
part of phrases or in the context of FTWindow/FTDistance.
In information retrieval system based upon inverted indices, it is traditional
to use stop words to remove high frequency terms from the index to reduce the
size of the inverted index. It is also traditional to ignore stop words during
query processing to improve query performance (both speed and precision).
The text:
"Some implementations may apply stop word lists during indexing and be unable
to comply with query-time requests to not apply those stop words."
implies that XQuery Full Text is amenable to the approach of inverted indices
with stop words stripped at index time.
Consider the query:
declase ft-option using stop words ("be", "not", "or", "to");
"to be or not to be" contains text "to"
According to the specification
"Stop words are tokens in the query that match any token in the text being
searched"
This seems to suggest that the result should be identical to
"to be or not to be" contains text ".+" using wildcards
Since "to be or not to be" is entirely composed of stop words, any application
of stop word lists during indexing means that it contains no tokens and thus
the result would be "false" rather than "true".
--
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Thursday, 17 February 2011 14:27:02 UTC