Full-text Requirements 3.4.4.

Hi,

I have some question about the MUST NOT in clause 3.4.4:
  XQuery/XPath Full-Text MUST NOT require an explicit definition of the global 
  corpus statistics (statistics, such as word frequency, used in calculating SCORE). 

Most retrieval models (-> ranking function for computing the SCORE) have indeed some
corpus statistic that influences the ranking.
But, in case of XQuery+Full-text, it cannot be predicted in general what is the collection
for which the background statistics should be drawn.

An example should clarify my statement, using selection on attribute in the "standard" 
XQuery part of the query as well as a sort-by-score clause given a full-text predicate.

Take a subselection of all computer science articles; if then is ranked on keyword
"network" it may be very useful if the corpus statistic for "network" is taken from the
subselection result, and not the original collection (of articles on any topic).
Similar examples can be given for multilingual collections, or educational material
where level (beginner/expert) could be part of the "normal" XQuery query components.

Yet, if the subselection selects all articles by say "John Woo", it is not likely that
the corpus statistic should be taken from the selection result. Similar examples can again
be thought of, like "newspaper articles of today", or "price = 0".

I wonder how the two different cases of using the background statistics can be assumed
without explicit statement in the query.

Best regards,

Arjen

PS: the problem can be circumvented by adding an aggregate that derives background statistics
from a collection - then, the aggregate can be applied to a selection result if desired
(the computer science articles example) or using the original statistics (the John Woo example)
[This idea was applied in http://www.cwi.nl/~arjen/pub/ds8_short.pdf].

====================================================================
CWI, room C0.11          Centre for Mathematics and Computer Science
Kruislaan 413                           Email: Arjen.de.Vries@cwi.nl
1098 SJ Amsterdam                       tel:       +31-(0)20-5924306
The Netherlands                         fax:       +31-(0)20-5924312  
===================== http://www.cwi.nl/~arjen/ ====================

Received on Tuesday, 18 February 2003 15:06:58 UTC