- From: andrewc <andrew.cao@cisra.canon.com.au>
- Date: Tue, 31 May 2005 09:46:58 +1000
- To: public-qt-comments@w3.org
Dear editors,
If I have XML data:
<book>A B A B A C B</book>
and query:
/book ftcontains "A" && "B" occurs from 2 to 3 times
FTWords ("A") will return 3 matches at position (1) (3) (5).
FTWords ("B") will return 3 matches at position (2) (4) (7).
FTAnd will return 9 matches at position (1,2) (1,4) (1,7) (3,2) (3,4)
(3,7) (5,2) (5,4) (5,7)
According to the semantics, "occurs from 2 to 3 times" is evaluated by
"occurs at least 2 times && ! occurs at least 4 times".
Now if we evaluate "! occurs at least 4 times":
FTTimes "occurs at least 4 times" has 9 input matches, and will produce
126 output matches.
After eliminating duplication, we get (numbers in parenthesis are word
positions) 13 output matches:
AllMatches
--- Match (1,2,3,4,7)
--- Match (1,2,3,4,5,7)
--- Match (1,2,4,5,7)
--- Match (1,2,4,5)
--- Match (1,2,3,4)
--- Match (1,2,3,7)
--- Match (1,2,3,4,5)
--- Match (1,2,3,5,7)
--- Match (1,2,5,7)
--- Match (1,3,4,7)
--- Match (1,3,4,5,7)
--- Match (2,3,4,5,7)
--- Match (2,3,5,7)
If we apply FTUnaryNot on this AllMatches, because the first Match has 5
StringMatches, the second Match has 6 StringMatches, and so on, we will
have:
5 x 6 x 5 x 4 x 4 x 4 x 5 x 5 x 4 x 4 x 5 x 5 x 4 = 384 million
possible combinations of StringMatches. Even if we do not materialize
these combinations at one time, the potential computation is huge. And
the query evaluation is only half way.
This is simple query and simple data. But evaluating this query may
incur such a a huge computation. What if "A" and "B" happen 10 times
each in /book?
My question is: Is my evaluation correct? If it is correct, would it be
nice if we could simplify the semantics of FTTimes, FTUnaryNot, etc.? Or
is there a lot of places to optimize?
Thanks,
Andrew
Received on Monday, 30 May 2005 23:48:34 UTC