FT: TokenInfo and StringInclude definition

Dear authors of XQuery Full-Text Specification,

Please clarify the following issues:

1. I am a bit confused with the definition of TokenInfo and StringInclude.

[Definition: A TokenInfo represents a contiguous collection of tokens
from an XML document. ]

[Definition: A StringInclude is a StringMatch that describes a
TokenInfo that must be contained in the document.]

 the UML Static Class diagram of AllMatches shows one-to one
correspondece between StringMatch and TokenInfo.

But from the XML Schema definition :

  <xs:element name="stringInclude"
              type="fts:stringMatch" />


  <xs:complexType name="stringMatch">
    <xs:sequence>
      <xs:element ref="fts:tokenInfo"/>
    </xs:sequence>
    <xs:attribute name="queryPos"
                  type="xs:integer"
                  use="required"/>
    <xs:attribute name="isContiguous"
                  type="xs:boolean"
                  use="required"/>
  </xs:complexType>

  <xs:complexType name="tokenInfo">
    <xs:attribute name="startPos"
                  type="xs:integer"
                  use="required"/>
    <xs:attribute name="endPos"
                  type="xs:integer"
                  use="required"/>
    <xs:attribute name="startSent"
                  type="xs:integer"
                  use="required"/>
    <xs:attribute name="endSent"
                  type="xs:integer"
                  use="required"/>
    <xs:attribute name="startPara"
                  type="xs:integer"
                  use="required"/>
    <xs:attribute name="endPara"
                  type="xs:integer"
                  use="required"/>
  </xs:complexType>

  <xs:element name="tokenInfo" type="fts:tokenInfo"/>

follows that StringMatch can contain a SEQUENCE of tokenInfo. So, we
have one-to many relationship.

Please, clarify the right relationship between StringMatch and tokenInfo.


2. In section  4.2.7.9 FTDistance you have an example: ("Ford Mustang"
ftand "excellent") distance at most 3 words

And you say at the end : "The result for the FTDistance selection
consists of only the first Match (with positions 1, 2, and 5) and the
fifth Match (with positions 25, 27, and 28), because only for these
Matches the word distance between consecutive TokenInfos is always
less than or equal to 3. It is 1 for the first pair and 3 for the
second in the first case, and 2 and 1 in the second."

Here for the first match you have 2 StringIncludes (shown on the diagram):
1) first StringInclude with startPos = 1 and endPos=2
2) second StringInclude with startPos = 5 (endPos = 5)

But what is the consecutive pairs ? It looks like with have 2
StringIncludes and have only ONE pair and distance = 5 - 2 -1 = 2, but
you say " It is 1 for the first pair and 3 for the second in the first
case" what defines something different.

Please, clarify how do you define the consecutive pairs ?


Thank you in advance,
Peter Pleshachkov

Received on Thursday, 11 December 2008 16:27:10 UTC