[Bug 12144] New: [FT] ApplyFT*Window semantics wrong

http://www.w3.org/Bugs/Public/show_bug.cgi?id=12144

           Summary: [FT] ApplyFT*Window semantics wrong
           Product: XPath / XQuery / XSLT
           Version: Candidate Recommendation
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Full Text 1.0
        AssignedTo: jim.melton@acm.org
        ReportedBy: paul@lucasmail.org
         QAContact: public-qt-comments@w3.org


After having sent e-mail on this to the public-qt-comments mailing list and
receiving no response, I'm now filing it as a bug that will force it to be
dealt with eventually.

Unless I'm missing something, I think the semantics for the ApplyFTWordWindow
may be wrong. The test FTNot-q6.xq in the XQuery Full-Text Test Suite is:

> declare variable $input-context external;
> 
> $input-context/books/book[
>   para contains text "software" ftand ("coder" ftand ftnot "ninja" window 5 words)
> ]/title

That test has the expected results of "nothing" yet our engine returns:

<title>Ninja Coder</title>

To simplify matters, I've whittled down that test into an equivalent test that
exhibits the same failure:

> let $x := <msg>ninja coder</msg>
> return $x contains text "coder" ftand ftnot "ninja" window 5 words

Running this test incorrectly returns true; if you remove the "window 5 words"
from the test, it correctly returns false.

What constitutes correctness of course depends on my interpretation of a window
filter used with an "ftand ftnot" means.  If the query were instead:

> let $x := <msg>ninja coder</msg>
> return $x contains text "coder" ftand "ninja" window 5 words

i.e., the "ftnot" were removed, then that means that, in order for the query to
return true, the words "coder" and "ninja" must not only both occur at least
once in the same document, but must also occur at least once within 5 words of
each other.

If the "ftnot" is put back, then I assume that means that, in order for the
query to return true, the word "coder" must occur at least once in the document
and the word "ninja", if it occurs at all, must never occur within 5 words of
any "coder".  

To compare the results from our engine, I've copied/pasted the XQuery
algorithms and the schema from the spec and run that using our engine for
comparison "hand-coding" the allMatches data, i.e.:

> let $am :=
>  <fts:allMatches stokenNum="1">
>    <fts:match>
>      <fts:stringInclude queryPos="1" isContiguous="T">
>        <fts:tokenInfo startPos="2" endPos="2" startSent="1" endSent="1"
>          startPara="1" endPara="1"/>
>      </fts:stringInclude>
>      <fts:stringExclude queryPos="2" isContiguous="T">
>        <fts:tokenInfo startPos="1" endPos="1" startSent="1" endSent="1"
>          startPara="1" endPara="1"/>
>      </fts:stringExclude>
>    </fts:match>
>  </fts:allMatches>
> return
>  fts:ApplyFTWordWindow( $am, 5 )

The results of that are:

<fts:allMatches xmlns:fts="http://www.w3.org/2007/xpath-full-text"
stokenNum="1">
 <fts:match>
   <fts:stringInclude queryPos="1" isContiguous="false">
     <fts:tokenInfo startPos="2" endPos="2" startSent="1" endSent="1"
       startPara="1" endPara="1"/>
   </fts:stringInclude>
   <fts:stringExclude queryPos="2" isContiguous="T">
     <fts:tokenInfo startPos="1" endPos="1" startSent="1" endSent="1"
       startPara="1" endPara="1"/>
   </fts:stringExclude>
 </fts:match>
 <fts:match>
   <fts:stringInclude queryPos="1" isContiguous="false">
     <fts:tokenInfo startPos="2" endPos="2" startSent="1" endSent="1"
       startPara="1" endPara="1"/>
   </fts:stringInclude>
   <fts:stringExclude queryPos="2" isContiguous="T">
     <fts:tokenInfo startPos="1" endPos="1" startSent="1" endSent="1"
       startPara="1" endPara="1"/>
   </fts:stringExclude>
 </fts:match>
 <fts:match>
   <fts:stringInclude queryPos="1" isContiguous="false">
     <fts:tokenInfo startPos="2" endPos="2" startSent="1" endSent="1"
       startPara="1" endPara="1"/>
   </fts:stringInclude>
   <fts:stringExclude queryPos="2" isContiguous="T">
     <fts:tokenInfo startPos="1" endPos="1" startSent="1" endSent="1"
       startPara="1" endPara="1"/>
   </fts:stringExclude>
 </fts:match>
 <fts:match>
   <fts:stringInclude queryPos="1" isContiguous="false">
     <fts:tokenInfo startPos="2" endPos="2" startSent="1" endSent="1"
       startPara="1" endPara="1"/>
   </fts:stringInclude>
   <fts:stringExclude queryPos="2" isContiguous="T">
     <fts:tokenInfo startPos="1" endPos="1" startSent="1" endSent="1"
       startPara="1" endPara="1"/>
   </fts:stringExclude>
 </fts:match>
 <fts:match>
   <fts:stringInclude queryPos="1" isContiguous="false">
     <fts:tokenInfo startPos="2" endPos="2" startSent="1" endSent="1"
       startPara="1" endPara="1"/>
   </fts:stringInclude>
 </fts:match>
</fts:allMatches>

The last match has a stringInclude but no stringExclude.  According to the
semantics for the FTContainsExpr in section 4.3:

>           return 
>              some $match in $allMatches/fts:match
>              satisfies 
>                 fn:count($match/fts:stringExclude) eq 0

it says to return "true" if there is at least one match that has no
stringExclude.  Well, as I've pointed out above, the last match has no
stringExclude.  Therefore, the query (according to the spec's own semantics)
returns true whereas the expected return should be false.

So, as far as I can tell, there a bug in the semantics of ApplyFTWordWindow,
ApplyFTSentenceWindow, and ApplyFTParagraphWindow.

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Sunday, 20 February 2011 16:01:59 UTC