[Bug 4454] [FT] match options order should be implementation-defined

http://www.w3.org/Bugs/Public/show_bug.cgi?id=4454

           Summary: [FT] match options order should be implementation-
                    defined
           Product: XPath / XQuery / XSLT
           Version: Working drafts
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Full Text
        AssignedTo: jim.melton@acm.org
        ReportedBy: doerre@de.ibm.com
         QAContact: public-qt-comments@w3.org


>From an implementation point of view there are two types of match options,
which influences how these can practically be applied: i) match options that
control a simple query rewrite step (without regard to what is actually
contained in an index) and ii) match options that affect the lexical lookup of
tokens in the index. According to this differentiation certain orders for match
option application are more natural, because the implementation of a kind i)
match option is less complex when that match option is applied in a query
rewrite step prior to lexical lookup.
The thesaurus options is the typical kind i) match option. Also, stop words
(when considered as query expansions, as our spec does) are as well.
Stemming is in-between the two kinds, as it typically involves a query-rewrite
step, but also affects lexical lookup.
On the other hand, wildcard, case, diacritics are typically of kind ii).

We defined the match option application order as:
1. ftlanguage
2. ftwildcard
3. ftthesaurus
4. ftstem
5. ftcase
6. ftdiacritics
7. ftstopword

This order is in conflict with the semantics of FTStopword and FTThesaurus, as
we have defined it in 4.6.2, where stop word filtering and thesaurus expansion
are done as query rewrite steps, hence precede all other options, except
language. 
The current semantics assumes an order:
1. ftlanguage
2. ftthesaurus
3. ftstopword
4. ftstem, ftcase, ftdiacriatics, ftwildcard

The order between the last  four would be implementation-defined. (Actually, I
would assume that ftcase, ftdiacritics and ftwildcard are commutative, hence
there's no need to define an order between them).

I can accept a partial order like above, but would opt for even more
flexibility: implementations should be able to choose what order they implement
also w.r.t. ftthesaurus vs. ftstopwords and ftstem vs. ftwildcard.
Best,
/Jochen

Received on Monday, 9 April 2007 12:02:32 UTC