[Bug 6303] FT: TokenInfo and StringInclude definition

http://www.w3.org/Bugs/Public/show_bug.cgi?id=6303


Petr Pleshachkov <peter.pleshachkov@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |peter.pleshachkov@gmail.com




--- Comment #3 from Petr Pleshachkov <peter.pleshachkov@gmail.com>  2008-12-11 21:07:49 ---
(In reply to comment #2)
But according to the spec: "the distance between the two is M2's starting
position minus M1's ending position, minus 1.". 
So, for the first match we should get the distance = 5 - 2 - 1 = 2. Is it right
? 

By the way, section 3.6.3 contains example: 

"/books/book ftcontains "web" ftand "site" ftand
"usability" distance at most 2 words"

with the following explanation:

"The following expression returns false:

The search context does contain the phrase "The usability of a Web site", in
which the tokens "usability" and "Web" have a distance of 2 words, and the
tokens "Web" and "site" have a distance of 0 words, both of which satisfy the
constraint distance at most 2 words. However, the problem is that "usability"
and "site" have a distance of 3 words, which does not satisfy the constraint,
and so the distance selection yields no matches, and the expression as a whole
yields false. (The phrase "Improving Web Site Usability" would satisfy the
given full-text selection, but it occurs in an attribute value, and so is not
subject to tokenization.)"

But the spec says that we have to check the distance between "successive pair
of matches"

So, we have to check the distance constraint for pairs: ("usability", "web")
and ("Web", "site"), but not for the pair ("usability", "site")

This is followed from the formal function as well:

declare function fts:ApplyFTWordDistanceAtMost (
      $allMatches as element(fts:allMatches),
      $n as xs:integer ) 
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}">
   {
      for $match in $allMatches/fts:match
      let $sorted := for $si in $match/fts:stringInclude          
                     order by $si/fts:tokenInfo/@startPos ascending,
                              $si/fts:tokenInfo/@endPos ascending
                     return $si
      where
         if (fn:count($sorted) le 1) then fn:true() else
            every $index in (1 to fn:count($sorted) - 1)
            satisfies fts:wordDistance(
                          $sorted[$index]/fts:tokenInfo,
                          $sorted[$index+1]/fts:tokenInfo
                      ) <= $n 
      return 
        <fts:match>
        {
           fts:joinIncludes($match/fts:stringInclude),
           for $stringExcl in $match/fts:stringExclude
           where some $stringIncl in $match/fts:stringInclude
                 satisfies fts:wordDistance(
                               $stringIncl/fts:tokenInfo,
                               $stringExcl/fts:tokenInfo
                           ) <= $n
           return $stringExcl
        }
        </fts:match>
   }
   </fts:allMatches>
};

So, is the example correct ? 

> [personal response:]
> 
> Re your point #2: Yes, I think that's a mistake in the specification.
> Where we say:
>     It is 1 for the first pair and 3 for the second in the first case,
>     and 2 and 1 in the second.
> We should instead say something like:
>     For the first Match, the word distance between 
>     the two TokenInfos is 3 (startPos 5 - endPos 2),
>     and for the fifth Match, it's 2 (startPos 27 - endPos 25).
> 


-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Thursday, 11 December 2008 21:07:58 UTC