- From: <bugzilla@wiggum.w3.org>
- Date: Mon, 09 Apr 2007 12:02:26 +0000
- To: public-qt-comments@w3.org
- CC:
http://www.w3.org/Bugs/Public/show_bug.cgi?id=4454
Summary: [FT] match options order should be implementation-
defined
Product: XPath / XQuery / XSLT
Version: Working drafts
Platform: All
OS/Version: All
Status: NEW
Severity: normal
Priority: P2
Component: Full Text
AssignedTo: jim.melton@acm.org
ReportedBy: doerre@de.ibm.com
QAContact: public-qt-comments@w3.org
>From an implementation point of view there are two types of match options,
which influences how these can practically be applied: i) match options that
control a simple query rewrite step (without regard to what is actually
contained in an index) and ii) match options that affect the lexical lookup of
tokens in the index. According to this differentiation certain orders for match
option application are more natural, because the implementation of a kind i)
match option is less complex when that match option is applied in a query
rewrite step prior to lexical lookup.
The thesaurus options is the typical kind i) match option. Also, stop words
(when considered as query expansions, as our spec does) are as well.
Stemming is in-between the two kinds, as it typically involves a query-rewrite
step, but also affects lexical lookup.
On the other hand, wildcard, case, diacritics are typically of kind ii).
We defined the match option application order as:
1. ftlanguage
2. ftwildcard
3. ftthesaurus
4. ftstem
5. ftcase
6. ftdiacritics
7. ftstopword
This order is in conflict with the semantics of FTStopword and FTThesaurus, as
we have defined it in 4.6.2, where stop word filtering and thesaurus expansion
are done as query rewrite steps, hence precede all other options, except
language.
The current semantics assumes an order:
1. ftlanguage
2. ftthesaurus
3. ftstopword
4. ftstem, ftcase, ftdiacriatics, ftwildcard
The order between the last four would be implementation-defined. (Actually, I
would assume that ftcase, ftdiacritics and ftwildcard are commutative, hence
there's no need to define an order between them).
I can accept a partial order like above, but would opt for even more
flexibility: implementations should be able to choose what order they implement
also w.r.t. ftthesaurus vs. ftstopwords and ftstem vs. ftwildcard.
Best,
/Jochen
Received on Monday, 9 April 2007 12:02:32 UTC