W3C home > Mailing lists > Public > public-qt-comments@w3.org > May 2005

Fw: [FT] FT word Distance exactly

From: Jochen Doerre <DOERRE@de.ibm.com>
Date: Mon, 2 May 2005 02:17:53 +0200
To: andrew.cao@cisra.canon.com.au
Cc: public-qt-comments@w3.org, member-query-fttf@w3.org
Message-ID: <OFBBC66AAA.6E2661D2-ONC1256FF4.0081590B-C1256FF5.00019B5B@de.ibm.com>


thanks again for pointing out this error in the semantics of the distance 
functions. Sorry for the late response.
Here is how the function fts:ApplyFTWordDistanceExactly should be. Please 
note the change in the return clause. As a result your query[2] will then 
evaluate to False, because SE-3 will not be eliminated.

declare function fts:ApplyFTWordDistanceExactly(
      $matchOptions as element(matchOptions, 
      $allMatches as element(allMatches, fts:AllMatches),
      $n as xs:integer)
      ) as element(allMatches, fts:AllMatches) {
     for $match in $allMatches/match
     let $sorted = for $si in $match/stringInclude 
                   order by $si/tokenInfo/@pos ascending
                   return $si
     where every $idx in (1 to fn:count($sorted) - 1)
           satisfies fts:wordDistance(
                         $matchOptions) = $n 
          for $stringExcl in $match/stringExclude
          where some $stringIncl in $match/stringInclude
                satisfies fts:wordDistance(
                              $matchOptions) = $n
          return $stringExcl

So, yes, as you pointed out it is sufficient for a StringExclude to be in 
the required distance with one of the remaining StringIncludes to be kept.
Actually the same correction has to be applied to the other distance 
functions (replacing "where every $stringIncl" with "where some 
$stringIncl" in the return clause).
The corrections will be included in the next Working Draft.

I add some more examples showing how distance and negation are intended to 

query[2] = . ftcontains ("word1" && "word2" && ! "word3") with distance 
exactly 0 words

The query matches, for example:
<node> ... word0 word1 word2 word4 ... </node>
and also
<node> ... word0 word2 word1 word4 ... </node>
in case none of the given words are matched by "word3". Loosely speaking, 
that query returns true for a node, if it contains word1 and word2 
adjacently in any order and not preceeded or succeeded by an occurrence of 

Hence, the following do not match:
<node> word1 word2 word3 </node>
<node> word2 word1 word3 </node>
<node> word3 word2 word1 </node>
<node> word3 word1 word2 </node>
<node> word1 word4 word2 </node> <!-- word1 and word2 need to be adjacent 
<node> word13 word2 </node> <!-- where word13 is matched by both word1 and 
word3 -->

Yours sincerely / Mit freundlichen Grüßen,
      Jochen Dörre
IBM Germany Böblingen Laboratory
DB2 Information Management Software
Phone: +49-7031-16-2992,    Fax: -4891,   Email: doerre@de.ibm.com

> Dear editors,
> When I have a node: <Node>word1 word2 word3</Node>
> I apply the query[1]:
> /Node ftcontains ("word1" && "word2" && "word3") with distance exactly 0 

> words
> I will get the AllMatches[1] as:
> --- AllMatches
>       --- Match
>             --- StringInclude (pos = 1)
>             --- StringInclude (pos = 2)
>             --- StringInclude (pos = 3)
> The final result is True.
> I apply the query[2]:
> /Node ftcontains ("word1" && "word2" && ! "word3") with distance exactly 

> 0 words
> I seem to get the AllMatches[2] as:
> --- AllMatches
>       --- Match
>             --- StringInclude (pos = 1)
>             --- StringInclude (pos = 2)
> The final result is also True.
> The reason for AllMatches[2] is that the StringExclude (pos = 3) which 
> is generated by ! "word3" has been dropped, according to semantics of 
> ApplyFTWordDistanceExactly, because SE-3 does not have a word distance 0 

> with both SI-1 and SI-2.
> Are my two results correct? If they are correct, would this be 
> inconsistent? Or what is the intuition when "word3" is a don't-care?
> Can I compare SE-3 to any one of SI-1 and SI-2, not to both of them?
> Thanks,
Received on Monday, 2 May 2005 00:18:01 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:57:05 UTC