- From: <bugzilla@wiggum.w3.org>
- Date: Mon, 28 Nov 2005 18:41:32 +0000
- To: public-qt-comments@w3.org
- Cc:
http://www.w3.org/Bugs/Public/show_bug.cgi?id=2299 ------- Additional Comments From doerre@de.ibm.com 2005-11-28 18:41 ------- To fix this we decided native phrases within matches as StringMatches that span multiple tokens (intervals). In order to do so, the TokenInfo model has to be extended to also model token intervals. At the same time this change allows us to allow for tokenizers producing overlapping tokens. Summary of the discussion/decision: - We need to allow for overlapping tokens for multiple reasons. - A phrase can be modeled as a "token" spanning multiple positions. This allows to treat it as a unit in constraints like FTDistance. - FTDistance constraints always disallow overlapping of tokens. A distance of 0 words (sentences/paragraphs) means adjacent word (sentence/paragraph). Summary of changes to the semantics: In 4.3.1 AllMatches Change the type TokenInfo to now include the attributes +startPos: integer +endPos: integer +startSent: integer +endSent: integer +startPara: integer +endPara: integer (as an aside: we also drop the "queryString", because it is not needed in the semantics.) 4.3.1.3 XML representation (of AllMatches) adapted to the model above. 4.3.1.4 and 4.3.1.5 (Normalization). To be adapted, but not yet done. 4.3.2.9 FTOrder Throughout the function, instead of testing for "tokenInfo/@pos", we should test for "tokenInfo/@startPos", i.e. the order constraint is only sensitive to the starting positions of matched tokens. 4.3.2.10 FTScope Same sentence: the input AllMatches must satisfy, that for each match all covered sentence positions in each of the StringIncludes must be the same. And retain only those StringExcludes that cover that same sentence (or, if no StringIncludes, at most one sentence). Different sentence: for each match the StringIncludes cover disjoint sentences. Keep StringExcludes that cover sentences not covered by any StringInclude (drop if some sentence covered by both). Same/different paragraph is analogous. 4.3.2.12 FTDistance Distance constraints are never satisfied for a match that contains two StringIncludes which overlap. Check for each match that the list of StringIncludes sorted by startPos is such that for each pair of consecutive StringIncludes SI1, SI2 the end position (sentence/paragraph) of SI1 (the preceding) is within the required distance from the start position (sentence/paragraph) of SI2 (the suceeding). And keep only StringExcludes that are within the required distance from one of the StringIncludes. (changed all 12 functions). 4.3.2.13 FTWindow For each match the minimal startPos and the maximal endPos of the StringIncludes must fit into a window of N positions. Drop StringExcludes that may not be completely covered by any window covering the StringIncludes.
Received on Monday, 28 November 2005 18:41:38 UTC