Re: FTS use-case: automatically determine granularity of result from Sihem Amer-Yahia on 2003-04-10 (public-qt-comments@w3.org from April 2003)

From: Sihem Amer-Yahia <sihem@research.att.com>
Date: Thu, 10 Apr 2003 10:42:43 -0400 (EDT)
To: kai.grossjohann@uni-duisburg.de
Cc: public-qt-comments@w3.org
Message-Id: <200304101442.h3AEghW13030@bual.research.att.com>

Hi Kai,

I find your email very interesting. Thank you. 

You are right, we have been considering CAS queries where a user
specify the granularity of the search (which could be the whole
document) but also, what he is expecting as an answer. What you are
suggesting with CO queries is to consider the case where users might
or might not specify the granularity of the search but do not have to
specify the granularity of what is being returned and the system,
using a notion of "retrievable unit" (to determine the finest
granularity to be returned), decides what is most appropriate to
return to the user using some ranking function.

My understanding of CO queries is that they generalize search engine
queries where the granularity of what is returned is always a
document. Probably, the reason why we did not consider these queries
is that in XPath/XQuery, the user has the ability to specify that a
search will occur in the whole document but does not have the ability
to specify that the system should decide which elements to return in
the answer. He must specify which answer he is expecting. We have been
thinking about FS queries in the same spirit.

We will definitely discuss your point and get back to you. 

Thanks

Sihem


>For this use-case, I'll assume a collection of books encoded in XML.
>I'll assume that a book is structured into parts, each part has
>chapters, each chapter has sections, and so on for subsections and
>perhaps paragraphs.
>
>I'll also assume that part, chapter, section, subsection, paragraph,
>are what I'll call "retrievable units".
>
>Now consider a user asking "Give me information about optimization in
>multimedia databases."  Maybe one of the books ("specific-book") is
>about this very topic.  Then, clearly, specific-book as a whole should
>be returned as a result of this query.  Let's say another book
>("general-book") is about database systems in general, and talks about
>multimedia database systems in one chapter.  And one section in that
>chapter is about optimization in multimedia database systems.  Then,
>the right item to retrieve might be section 5.2.
>
>Note that chapter 5 in general-book is, of course, also an answer to
>the query, since it contains all the information desired by the
>user.  However, the user will have to read some sections which are
>not about the topic of interest, and therefore the answer chapter 5
>in book general-book is worse than the answer section 5.2 in the same
>book.  This fact should be reflected in the ranking returned by the
>system.
>
>I used the term "retrievable unit" because it does not make sense to
>return any XML element.  For example, suppose that the introduction of
>general-book happens to list the topics of that book in a bulleted
>list.  Then the XML element "<li>query optimization in multimedia
>databases</li>" in that list (using HTML-like tags for illustration
>purposes), while matching the query as such, is clearly not a good
>answer, since it does not give the user any information.
>
>About half of the INEX initiative is devoted to this kind of query
>(they call these queries "content-only" queries, and there is a second
>kind of query, called "content-and-structure"), so a sizable number of
>people believe that this use-case is important.  You can get more
>information about INEX on its home page:
>http://qmir.dcs.qmw.ac.uk/INEX/
>
>Regards,
>Kai

Received on Thursday, 10 April 2003 10:42:46 UTC