- From: <bugzilla@jessica.w3.org>
- Date: Thu, 27 Jan 2011 08:59:00 +0000
- To: public-qt-comments@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=11885 Summary: [XQFTTS] english-stems.txt stemming dictionary Product: XPath / XQuery / XSLT Version: Proposed Recommendation Platform: PC OS/Version: Windows NT Status: NEW Severity: normal Priority: P2 Component: Full Text 1.0 AssignedTo: jim.melton@acm.org ReportedBy: tim@cbcl.co.uk QAContact: public-qt-comments@w3.org The file "english-stems.txt" contains stemming rules only for lower case text. However, the specification clearly states that the "Stemming Option must be applied before the Case Option and the Diacritics Option". So when tokenizing the string "Dogs and Cats" with stemming, the okens presented to the tokenizer must be "Dogs", "and", "Cats". The guidelines for running XQFTTS state that the "stemming-dictionary is a plain text file containing lines of whitespace-separated tokens. Each token on the line should stem to the first token on the line." Note that it is conceivable that the stemming dictionary might stem "AIDS" to "AIDS" but "aids" to "aid". This would be a useful test of the order of application of stemming and case options. Presumably the test suite doesn't currently test this. -- Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
Received on Thursday, 27 January 2011 08:59:02 UTC