W3C home > Mailing lists > Public > public-qt-comments@w3.org > February 2011

ApplyFTWordWindow semantics wrong?

From: Paul J. Lucas <paul@lucasmail.org>
Date: Sun, 13 Feb 2011 10:00:01 -0800
Message-Id: <F3201272-8279-4B77-A843-47F076305949@lucasmail.org>
To: public-qt-comments@w3.org
Unless I'm missing something, I think the semantics for the ApplyFTWordWindow may be wrong.  The test FTNot-q6.xq in the XQuery Full-Text Test Suite is:

> declare variable $input-context external;
> 
> $input-context/books/book[
>  para contains text "software" ftand ("coder" ftand ftnot "ninja" window 5 words)
> ]/title

That test has the expected results of "nothing" yet our engine returns:

> <title>Ninja Coder</title>


To simplify matters, I've whittled down that test into an equivalent test that exhibits the same failure:

> let $x := <msg>ninja coder</msg>
> return $x contains text "coder" ftand ftnot "ninja" window 5 words

Running this test incorrectly returns true; if you remove the "window 5 words" from the test, it correctly returns false.

What constitutes correctness of course depends on my interpretation of a window filter used with an "ftand ftnot" means.  If the query were instead:

> let $x := <msg>ninja coder</msg>
> return $x contains text "coder" ftand "ninja" window 5 words

i.e., the "ftnot" were removed, then that means that, in order for the query to return true, the words "coder" and "ninja" must not only both occur at least once in the same document, but must also occur at least once within 5 words of each other.

If the "ftnot" is put back, then I assume that means that, in order for the query to return true, the word "coder" must occur at least once in the document and the word "ninja", if it occurs at all, must never occur within 5 words of any "coder".  

To compare the results from our engine, I've copied/pasted the XQuery algorithms and the schema from the spec and run that using our engine for comparison "hand-coding" the allMatches data, i.e.:

> let $am :=
>   <fts:allMatches stokenNum="1">
>     <fts:match>
>       <fts:stringInclude queryPos="1" isContiguous="T">
>         <fts:tokenInfo startPos="2" endPos="2" startSent="1" endSent="1"
>           startPara="1" endPara="1"/>
>       </fts:stringInclude>
>       <fts:stringExclude queryPos="2" isContiguous="T">
>         <fts:tokenInfo startPos="1" endPos="1" startSent="1" endSent="1"
>           startPara="1" endPara="1"/>
>       </fts:stringExclude>
>     </fts:match>
>   </fts:allMatches>
> return
>   fts:ApplyFTWordWindow( $am, 5 )

The results of that are:

> <fts:allMatches xmlns:fts="http://www.w3.org/2007/xpath-full-text" stokenNum="1">
>   <fts:match>
>     <fts:stringInclude queryPos="1" isContiguous="false">
>       <fts:tokenInfo startPos="2" endPos="2" startSent="1" endSent="1"
>         startPara="1" endPara="1"/>
>     </fts:stringInclude>
>     <fts:stringExclude queryPos="2" isContiguous="T">
>       <fts:tokenInfo startPos="1" endPos="1" startSent="1" endSent="1"
>         startPara="1" endPara="1"/>
>     </fts:stringExclude>
>   </fts:match>
>   <fts:match>
>     <fts:stringInclude queryPos="1" isContiguous="false">
>       <fts:tokenInfo startPos="2" endPos="2" startSent="1" endSent="1"
>         startPara="1" endPara="1"/>
>     </fts:stringInclude>
>     <fts:stringExclude queryPos="2" isContiguous="T">
>       <fts:tokenInfo startPos="1" endPos="1" startSent="1" endSent="1"
>         startPara="1" endPara="1"/>
>     </fts:stringExclude>
>   </fts:match>
>   <fts:match>
>     <fts:stringInclude queryPos="1" isContiguous="false">
>       <fts:tokenInfo startPos="2" endPos="2" startSent="1" endSent="1"
>         startPara="1" endPara="1"/>
>     </fts:stringInclude>
>     <fts:stringExclude queryPos="2" isContiguous="T">
>       <fts:tokenInfo startPos="1" endPos="1" startSent="1" endSent="1"
>         startPara="1" endPara="1"/>
>     </fts:stringExclude>
>   </fts:match>
>   <fts:match>
>     <fts:stringInclude queryPos="1" isContiguous="false">
>       <fts:tokenInfo startPos="2" endPos="2" startSent="1" endSent="1"
>         startPara="1" endPara="1"/>
>     </fts:stringInclude>
>     <fts:stringExclude queryPos="2" isContiguous="T">
>       <fts:tokenInfo startPos="1" endPos="1" startSent="1" endSent="1"
>         startPara="1" endPara="1"/>
>     </fts:stringExclude>
>   </fts:match>
>   <fts:match>
>     <fts:stringInclude queryPos="1" isContiguous="false">
>       <fts:tokenInfo startPos="2" endPos="2" startSent="1" endSent="1"
>         startPara="1" endPara="1"/>
>     </fts:stringInclude>
>   </fts:match>
> </fts:allMatches>

The last match has a stringInclude but no stringExclude.  According to the semantics for the FTContainsExpr in section 4.3:

>             return 
>                some $match in $allMatches/fts:match
>                satisfies 
>                   fn:count($match/fts:stringExclude) eq 0

it says to return "true" if there is at least one match that has no stringExclude.  Well, as I've pointed out above, the last match has no stringExclude.  Therefore, the query (according to the spec's own semantics) returns true whereas the expected return should be false.

So, is there a bug in the semantics of ApplyFTWordWindow?

- Paul
Received on Sunday, 13 February 2011 18:00:41 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:45:45 UTC