- From: Arjen P. de Vries <Arjen.de.Vries@cwi.nl>
- Date: Tue, 18 Feb 2003 20:46:17 +0100
- To: public-qt-comments@w3.org
Hi, I have some question about the MUST NOT in clause 3.4.4: XQuery/XPath Full-Text MUST NOT require an explicit definition of the global corpus statistics (statistics, such as word frequency, used in calculating SCORE). Most retrieval models (-> ranking function for computing the SCORE) have indeed some corpus statistic that influences the ranking. But, in case of XQuery+Full-text, it cannot be predicted in general what is the collection for which the background statistics should be drawn. An example should clarify my statement, using selection on attribute in the "standard" XQuery part of the query as well as a sort-by-score clause given a full-text predicate. Take a subselection of all computer science articles; if then is ranked on keyword "network" it may be very useful if the corpus statistic for "network" is taken from the subselection result, and not the original collection (of articles on any topic). Similar examples can be given for multilingual collections, or educational material where level (beginner/expert) could be part of the "normal" XQuery query components. Yet, if the subselection selects all articles by say "John Woo", it is not likely that the corpus statistic should be taken from the selection result. Similar examples can again be thought of, like "newspaper articles of today", or "price = 0". I wonder how the two different cases of using the background statistics can be assumed without explicit statement in the query. Best regards, Arjen PS: the problem can be circumvented by adding an aggregate that derives background statistics from a collection - then, the aggregate can be applied to a selection result if desired (the computer science articles example) or using the original statistics (the John Woo example) [This idea was applied in http://www.cwi.nl/~arjen/pub/ds8_short.pdf]. ==================================================================== CWI, room C0.11 Centre for Mathematics and Computer Science Kruislaan 413 Email: Arjen.de.Vries@cwi.nl 1098 SJ Amsterdam tel: +31-(0)20-5924306 The Netherlands fax: +31-(0)20-5924312 ===================== http://www.cwi.nl/~arjen/ ====================
Received on Tuesday, 18 February 2003 15:06:58 UTC