- From: <bugzilla@jessica.w3.org>
- Date: Sun, 20 Feb 2011 16:01:54 +0000
- To: public-qt-comments@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=12144 Summary: [FT] ApplyFT*Window semantics wrong Product: XPath / XQuery / XSLT Version: Candidate Recommendation Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Full Text 1.0 AssignedTo: jim.melton@acm.org ReportedBy: paul@lucasmail.org QAContact: public-qt-comments@w3.org After having sent e-mail on this to the public-qt-comments mailing list and receiving no response, I'm now filing it as a bug that will force it to be dealt with eventually. Unless I'm missing something, I think the semantics for the ApplyFTWordWindow may be wrong. The test FTNot-q6.xq in the XQuery Full-Text Test Suite is: > declare variable $input-context external; > > $input-context/books/book[ > para contains text "software" ftand ("coder" ftand ftnot "ninja" window 5 words) > ]/title That test has the expected results of "nothing" yet our engine returns: <title>Ninja Coder</title> To simplify matters, I've whittled down that test into an equivalent test that exhibits the same failure: > let $x := <msg>ninja coder</msg> > return $x contains text "coder" ftand ftnot "ninja" window 5 words Running this test incorrectly returns true; if you remove the "window 5 words" from the test, it correctly returns false. What constitutes correctness of course depends on my interpretation of a window filter used with an "ftand ftnot" means. If the query were instead: > let $x := <msg>ninja coder</msg> > return $x contains text "coder" ftand "ninja" window 5 words i.e., the "ftnot" were removed, then that means that, in order for the query to return true, the words "coder" and "ninja" must not only both occur at least once in the same document, but must also occur at least once within 5 words of each other. If the "ftnot" is put back, then I assume that means that, in order for the query to return true, the word "coder" must occur at least once in the document and the word "ninja", if it occurs at all, must never occur within 5 words of any "coder". To compare the results from our engine, I've copied/pasted the XQuery algorithms and the schema from the spec and run that using our engine for comparison "hand-coding" the allMatches data, i.e.: > let $am := > <fts:allMatches stokenNum="1"> > <fts:match> > <fts:stringInclude queryPos="1" isContiguous="T"> > <fts:tokenInfo startPos="2" endPos="2" startSent="1" endSent="1" > startPara="1" endPara="1"/> > </fts:stringInclude> > <fts:stringExclude queryPos="2" isContiguous="T"> > <fts:tokenInfo startPos="1" endPos="1" startSent="1" endSent="1" > startPara="1" endPara="1"/> > </fts:stringExclude> > </fts:match> > </fts:allMatches> > return > fts:ApplyFTWordWindow( $am, 5 ) The results of that are: <fts:allMatches xmlns:fts="http://www.w3.org/2007/xpath-full-text" stokenNum="1"> <fts:match> <fts:stringInclude queryPos="1" isContiguous="false"> <fts:tokenInfo startPos="2" endPos="2" startSent="1" endSent="1" startPara="1" endPara="1"/> </fts:stringInclude> <fts:stringExclude queryPos="2" isContiguous="T"> <fts:tokenInfo startPos="1" endPos="1" startSent="1" endSent="1" startPara="1" endPara="1"/> </fts:stringExclude> </fts:match> <fts:match> <fts:stringInclude queryPos="1" isContiguous="false"> <fts:tokenInfo startPos="2" endPos="2" startSent="1" endSent="1" startPara="1" endPara="1"/> </fts:stringInclude> <fts:stringExclude queryPos="2" isContiguous="T"> <fts:tokenInfo startPos="1" endPos="1" startSent="1" endSent="1" startPara="1" endPara="1"/> </fts:stringExclude> </fts:match> <fts:match> <fts:stringInclude queryPos="1" isContiguous="false"> <fts:tokenInfo startPos="2" endPos="2" startSent="1" endSent="1" startPara="1" endPara="1"/> </fts:stringInclude> <fts:stringExclude queryPos="2" isContiguous="T"> <fts:tokenInfo startPos="1" endPos="1" startSent="1" endSent="1" startPara="1" endPara="1"/> </fts:stringExclude> </fts:match> <fts:match> <fts:stringInclude queryPos="1" isContiguous="false"> <fts:tokenInfo startPos="2" endPos="2" startSent="1" endSent="1" startPara="1" endPara="1"/> </fts:stringInclude> <fts:stringExclude queryPos="2" isContiguous="T"> <fts:tokenInfo startPos="1" endPos="1" startSent="1" endSent="1" startPara="1" endPara="1"/> </fts:stringExclude> </fts:match> <fts:match> <fts:stringInclude queryPos="1" isContiguous="false"> <fts:tokenInfo startPos="2" endPos="2" startSent="1" endSent="1" startPara="1" endPara="1"/> </fts:stringInclude> </fts:match> </fts:allMatches> The last match has a stringInclude but no stringExclude. According to the semantics for the FTContainsExpr in section 4.3: > return > some $match in $allMatches/fts:match > satisfies > fn:count($match/fts:stringExclude) eq 0 it says to return "true" if there is at least one match that has no stringExclude. Well, as I've pointed out above, the last match has no stringExclude. Therefore, the query (according to the spec's own semantics) returns true whereas the expected return should be false. So, as far as I can tell, there a bug in the semantics of ApplyFTWordWindow, ApplyFTSentenceWindow, and ApplyFTParagraphWindow. -- Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
Received on Sunday, 20 February 2011 16:01:59 UTC