Re: FTS comments

Jonathan Robie <jonathan.robie@datadirect-technologies.com> writes:

> This has come up a number of times. Here's my opinion: the semantics
> of "close match" for a structure can be highly application-dependent,
> and it's not clear to me whether there is an approach to approximate
> structural match that will work at all well across a variety of
> structures from different application areas.  For instance, in this
> case, is a car a close match if the price is too high, or is that an
> absolute criterion for the user? To get useful results, it may be the
> case that the user really does need to let us know, in some way, which
> criteria to relax. There may be literature or products that tell us
> that we actually know how to do this well in the general case - do you
> know of any that we might want to look at?

The situation is quite the same as a full-text query where the user
has entered, say, ten words.  Which of the words is most important?

It's the same problem.

One way to approach it is to provide a weighted-sum operator in the
query language (that's what we did for XIRQL).  Then users can say
0.3 * color=red + 0.7 * price<10,000; or they can swap the
coefficients.  (This example was not intended to conform to any
particular syntax, I hope you can understand what I meant.)

Another way to approach it is to use what IR people would call idf:
suppose that there is only one red car in the whole collection.  Then
maybe it should come out first even if it violates the price
condition.  (One red car in the whole collection ==> low "document
frequency" ==> high "inverse document frequency".)

Still another way to approach it is to use relevance feedback: using
the weighted sum operation, we guess some coefficients and present
results to the user.  The user tells us which cards he likes.  We use
that information to tune the coefficients.

(Relevance feedback for fact conditions is still an open research
issue, AFAIK.  Maybe if the user consistently marks convertibles as
being highly interesting, then we should introduce a new condition
into the query, asking for convertibles?)


Maybe I should make one thing explicit: I am not thinking about a
Boolean point of view.  Maybe you consider a car to be red or not
red, and that's it.  For me, it's more or less red.  (This is a
somewhat silly example, but you could include color similarities,
then it would make more sense.  And for other conditions such as
price<10,000 it's clear that the condition can be fulfilled to
different degrees.)

So for each of the basic query conditions (let's consider color=red
and price<10,000 as the basic conditions) there is a degree of it
being fulfilled, and these degrees are then combined.

Fuzzy logic aficionados say that any car belongs to the result set in
a lesser or higher degree, friends of probabilities (I'm one of them)
say that there is a lower or higher probability of the car being
relevant to the information need of the user.  But as long as there
is a number of some kind that can be interpreted as the score or
degree or probability, then I'll be happy :-)

-- 
A preposition is not a good thing to end a sentence with.

Received on Tuesday, 1 April 2003 07:31:58 UTC