FTS use-case: automatically determine granularity of result

For this use-case, I'll assume a collection of books encoded in XML.
I'll assume that a book is structured into parts, each part has
chapters, each chapter has sections, and so on for subsections and
perhaps paragraphs.

I'll also assume that part, chapter, section, subsection, paragraph,
are what I'll call "retrievable units".

Now consider a user asking "Give me information about optimization in
multimedia databases."  Maybe one of the books ("specific-book") is
about this very topic.  Then, clearly, specific-book as a whole should
be returned as a result of this query.  Let's say another book
("general-book") is about database systems in general, and talks about
multimedia database systems in one chapter.  And one section in that
chapter is about optimization in multimedia database systems.  Then,
the right item to retrieve might be section 5.2.

Note that chapter 5 in general-book is, of course, also an answer to
the query, since it contains all the information desired by the
user.  However, the user will have to read some sections which are
not about the topic of interest, and therefore the answer chapter 5
in book general-book is worse than the answer section 5.2 in the same
book.  This fact should be reflected in the ranking returned by the
system.

I used the term "retrievable unit" because it does not make sense to
return any XML element.  For example, suppose that the introduction of
general-book happens to list the topics of that book in a bulleted
list.  Then the XML element "<li>query optimization in multimedia
databases</li>" in that list (using HTML-like tags for illustration
purposes), while matching the query as such, is clearly not a good
answer, since it does not give the user any information.

About half of the INEX initiative is devoted to this kind of query
(they call these queries "content-only" queries, and there is a second
kind of query, called "content-and-structure"), so a sizable number of
people believe that this use-case is important.  You can get more
information about INEX on its home page:
http://qmir.dcs.qmw.ac.uk/INEX/

Regards,
Kai
-- 
file-error; Data: (Opening input file no such file or directory ~/.signature)

Received on Thursday, 10 April 2003 09:22:47 UTC