- From: andrewc <andrew.cao@cisra.canon.com.au>
- Date: Tue, 31 May 2005 09:46:58 +1000
- To: public-qt-comments@w3.org
Dear editors, If I have XML data: <book>A B A B A C B</book> and query: /book ftcontains "A" && "B" occurs from 2 to 3 times FTWords ("A") will return 3 matches at position (1) (3) (5). FTWords ("B") will return 3 matches at position (2) (4) (7). FTAnd will return 9 matches at position (1,2) (1,4) (1,7) (3,2) (3,4) (3,7) (5,2) (5,4) (5,7) According to the semantics, "occurs from 2 to 3 times" is evaluated by "occurs at least 2 times && ! occurs at least 4 times". Now if we evaluate "! occurs at least 4 times": FTTimes "occurs at least 4 times" has 9 input matches, and will produce 126 output matches. After eliminating duplication, we get (numbers in parenthesis are word positions) 13 output matches: AllMatches --- Match (1,2,3,4,7) --- Match (1,2,3,4,5,7) --- Match (1,2,4,5,7) --- Match (1,2,4,5) --- Match (1,2,3,4) --- Match (1,2,3,7) --- Match (1,2,3,4,5) --- Match (1,2,3,5,7) --- Match (1,2,5,7) --- Match (1,3,4,7) --- Match (1,3,4,5,7) --- Match (2,3,4,5,7) --- Match (2,3,5,7) If we apply FTUnaryNot on this AllMatches, because the first Match has 5 StringMatches, the second Match has 6 StringMatches, and so on, we will have: 5 x 6 x 5 x 4 x 4 x 4 x 5 x 5 x 4 x 4 x 5 x 5 x 4 = 384 million possible combinations of StringMatches. Even if we do not materialize these combinations at one time, the potential computation is huge. And the query evaluation is only half way. This is simple query and simple data. But evaluating this query may incur such a a huge computation. What if "A" and "B" happen 10 times each in /book? My question is: Is my evaluation correct? If it is correct, would it be nice if we could simplify the semantics of FTTimes, FTUnaryNot, etc.? Or is there a lot of places to optimize? Thanks, Andrew
Received on Monday, 30 May 2005 23:48:34 UTC