- From: Todd A. Mancini <todd.mancini@daxat.com>
- Date: Mon, 17 Feb 2003 10:58:19 -0500
- To: <public-qt-comments@w3.org>
In the current 'XQuery and XPath Full-Text Requirements' draft, section 6.1 Functionaltity, stopwords are labeled as MUST. I would propose this being changed to MAY. The existence of stop words arose mainly from the inability of older generation full-text engines to be able to handle those terms which were extremely common in the source text due to inefficient indexing algorithms. Stop words were typically NOT a feature -- they were a band-aid. It seems unnatural to say that an engine which does not have stop words is less powerful than an engine with stop words, assuming both can perform. Consider a phrase such as "to be or not to be." If a student is researching the works of Shakespeare, would that student consider a full-text engine more powerful if that engine labeled all of these terms as stop words, and therefore reduced the query to nothing, or, as some of the use-cases imply, a search for ANY 6-word phrase? If someone can develop a full-text search engine without stop words which can perform on-par, or faster than, another engine which has stop words turned on, should that first vendor be required to add stop word support? As the former Director of Professional Services for AltaVista Software, I speak from experience. My customers routinely replaced existing engines which mandated stop words with the AltaVista engine, which does not natively support stop words nor require such support to perform. Searching an index of 10 million documents for the word 'the' is possible with extremely modest hardware. A great percentage of real-world scenarios involving Full-Text will likely involve less than millions of XML nodes -- not having stop word support would be completely appropriate in the vast majority of uses of the technology. Therefore, I argue that MAY is a better extent to which stopword support should be functionally mandated. [Disclaimer: I am no longer affiliated with the AltaVista Company, nor am I selling their software. This is not an advertisement for the AltaVista software, but rather a real-world example showing the value of NOT mandating stopword support.] Thank you for your consideration. -Todd Mancini
Received on Monday, 17 February 2003 11:06:01 UTC