- From: <bugzilla@wiggum.w3.org>
- Date: Thu, 16 Apr 2009 19:30:32 +0000
- To: public-qt-comments@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=6830
Summary: [FT] Thesaurus vs other Match Options
Product: XPath / XQuery / XSLT
Version: Candidate Recommendation
Platform: All
OS/Version: All
Status: NEW
Severity: normal
Priority: P2
Component: Full Text 1.0
AssignedTo: jim.melton@acm.org
ReportedBy: christian.gruen@gmail.com
QAContact: public-qt-comments@w3.org
Hi again,
I noticed that the evaluation of a combination of several match options with
the Thesaurus may lead to different interpretations. My major question is if
other match options influence the way the thesaurus works. An example:
"improving" ftcontains "improve" with stemming
This query should return true. If we add a thesaurus here..
"improving" ftcontains "optimizing" with stemming with thesaurus..
...and if the thesaurus resvolves "optimize" to "improve", I am wondering if
this query will return true, as the thesaurus entries would have to be stemmed
as well.
The same problem/question occurs with the default match options. E.g.: Are
diacritics to be removed in the thesaurus?
As a Thesaurus can get pretty large, similar to index structures, I would
recommend to apply all match options while building and BEFORE querying the
Thesaurus - otherwise, Thesaurus requests could get pretty expensive. This is
why I would propose to extend section 3.4 of the specification:
1. The Language Option must be applied first
2. The Stemming Option must be applied before the Case Option and the
Diacritics Option
-> 3. The Thesaurus Option must be applied after all other options
This will also make sense, as the Thesaurus might not be accessed at all if the
query and document term equal anyway...
"A" ftcontains "A" with thesaurus...
-> should yields true without even checking the thesaurus
I just discovered the following sentence in the first section of the Specs..
"The WGs particularly solicit feedback regarding how thesauri are to be used in
combination."
So I hope that my discussion here contributes a little to this issue.
Christian
--
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Thursday, 16 April 2009 19:30:45 UTC