[Bug 9858] [FT] FTStopWordOption and FTCaseOption interaction clarification

http://www.w3.org/Bugs/Public/show_bug.cgi?id=9858





--- Comment #4 from Michael Dyck <jmdyck@ibiblio.org>  2010-07-14 17:10:15 ---
(In reply to comment #2)
> Even though this bug has been "resolved" by making the answer "implementation
> dependent,"

Implementation-defined, actually; "implementation-dependent" means something
else.

> the issue, despite Mr. Dyck's statement to the contrary, really
> does have to do with the query tokens.

Indeed it does. If you think I said it didn't, then it seems you misunderstood
me.

> [...]
> If my query were instead:
> 
>     let $x := <p>BEST OF TIMES</p>
>     return $x contains text "best any times"
>       using stop words ("any")
> 
> then the query term would effectively become:
> 
>     "best .* times" using wildcards
> 
> which matches "BEST OF TIMES" [...]

Agreed.

> Now, if we return to my original query: if "using case sensitive" were to
> apply to stop-word determination, then "ANY" would not be found in the list
> of stop-words of "any"; hence, "ANY" would not be considered a stop-word and
> therefore it would not be "removed from the search and [allow] any token [to]
> be substituted for it."  So "BEST ANY TIMES" would not match "BEST OF TIMES"
> and the query would return false.

Agreed.

> If "using case sensitive" were not to be considered during stop-word
> determination, then "ANY" would be found in the list of stop-words of "any";
> hence "ANY" would be considered a stop-word and therefore would be "removed
> from the search and [allow] any token [to] be substituted for it."  So
> "BEST .* TIMES" would match "BEST OF TIMES" and the query would return true.

Agreed, more or less.

In the second paragraph of my comment #1, I summarized what I saw to be the
point of your example, and I believe it's consistent with what you've said
above.

My subsequent point was that, although the matter certainly hinges on whether a
particular comparison is case-[in]sensitive, it's incorrect to bring the case
option into the discussion, because the case option is not defined to govern
comparisons of the two things being compared here. Specifically, the case
option governs the matching of
   a query token vs. a token in the text being searched,
not the comparison of
   a query token vs. a stop word in the collection of stop words
                     defined by a stop word option

> Also, and very importantly, it's intentional and entirely the point that "any"
> is *not* in the text being searched.

Agreed. I think I see the problem. When I said:
    and the stop word "any" is not in the text being searched.
I did *not* mean:
    and the token "any" does not occur in the text being searched.
Rather, I meant something more like:
    and, in the stop word option
        using stop words ("any")
    that "any" is a StringLiteral in an    FTStopWordOption,
    not a token in the text being searched (and so, is not
    something that the case option is defined to deal with).

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Wednesday, 14 July 2010 17:10:18 UTC