Re: FTS comments from Jonathan Robie on 2003-03-24 (public-qt-comments@w3.org from March 2003)

From: Jonathan Robie <jonathan.robie@datadirect-technologies.com>
Date: Mon, 24 Mar 2003 12:24:56 -0500
To: kai.grossjohann@uni-duisburg.de (Kai Großjohann ), public-qt-comments@w3.org
Message-Id: <5.2.0.9.0.20030324121707.0118fd70@ncmail.datadirect-technologies.com>

At 10:18 PM 3/22/2003 +0100, Kai Großjohann wrote:
>* Application of SCORE to non-text conditions.
>
>   I believe that vagueness and uncertainty, the central issues of
>   Information Retrieval, are vital features for systems even outside
>   the domain of full text.  Consider the infamous used-car database
>   example: say the user searches for a white Lincoln Continental from
>   2001 with a given mileage (is that the right word? number of miles
>   run by that car is what I mean) and price.  There are no full text
>   conditions in this example.  Yet, what happens if there are no cars
>   fulfilling the exact condition but only cars that are "close
>   matches"?  One approach would be to interpret the conditions
>   vaguely.  Another approach requires the user to specify another
>   query to find those "close matches".  However, the latter approach
>   requires the user to know which of the query conditions to relax to
>   find that close match, and thus requires knowledge of the contents
>   of the database.  Surely this is not desirable: if the user knew
>   what's in the database, why search?
>
>   Therefore, the "best match" approach is important also for non-text
>   conditions.

I'm currently not very active in full-text, but I wanted to offer 
my  thoughts on this.

This has come up a number of times. Here's my opinion: the semantics of 
"close match" for a structure can be highly application-dependent, and it's 
not clear to me whether there is an approach to approximate structural 
match that will work at all well across a variety of structures from 
different application areas.  For instance, in this case, is a car a close 
match if the price is too high, or is that an absolute criterion for the 
user? To get useful results, it may be the case that the user really does 
need to let us know, in some way, which criteria to relax. There may be 
literature or products that tell us that we actually know how to do this 
well in the general case - do you know of any that we might want to look at?

Jonathan

Received on Monday, 24 March 2003 12:25:04 UTC