RE: FTS comments

I think this is all excellent stuff, and I would love to do this, but please
remember we are in the standards business and not the research business!
It's a bad mistake to standardize technology while it is still in the
research phase.

Michael Kay
Software AG

> -----Original Message-----
> From: kai.grossjohann@uni-duisburg.de 
> [mailto:kai.grossjohann@uni-duisburg.de] 
> Sent: 01 April 2003 12:31
> To: Jonathan Robie
> Cc: public-qt-comments@w3.org
> Subject: Re: FTS comments
> 
> 
> 
> Jonathan Robie <jonathan.robie@datadirect-technologies.com> writes:
> 
> > This has come up a number of times. Here's my opinion: the 
> semantics 
> > of "close match" for a structure can be highly 
> application-dependent, 
> > and it's not clear to me whether there is an approach to 
> approximate 
> > structural match that will work at all well across a variety of 
> > structures from different application areas.  For instance, in this 
> > case, is a car a close match if the price is too high, or 
> is that an 
> > absolute criterion for the user? To get useful results, it 
> may be the 
> > case that the user really does need to let us know, in some 
> way, which 
> > criteria to relax. There may be literature or products that tell us 
> > that we actually know how to do this well in the general 
> case - do you 
> > know of any that we might want to look at?
> 
> The situation is quite the same as a full-text query where 
> the user has entered, say, ten words.  Which of the words is 
> most important?
> 
> It's the same problem.
> 
> One way to approach it is to provide a weighted-sum operator 
> in the query language (that's what we did for XIRQL).  Then 
> users can say 0.3 * color=red + 0.7 * price<10,000; or they 
> can swap the coefficients.  (This example was not intended to 
> conform to any particular syntax, I hope you can understand 
> what I meant.)
> 
> Another way to approach it is to use what IR people would 
> call idf: suppose that there is only one red car in the whole 
> collection.  Then maybe it should come out first even if it 
> violates the price condition.  (One red car in the whole 
> collection ==> low "document frequency" ==> high "inverse 
> document frequency".)
> 
> Still another way to approach it is to use relevance 
> feedback: using the weighted sum operation, we guess some 
> coefficients and present results to the user.  The user tells 
> us which cards he likes.  We use that information to tune the 
> coefficients.
> 
> (Relevance feedback for fact conditions is still an open 
> research issue, AFAIK.  Maybe if the user consistently marks 
> convertibles as being highly interesting, then we should 
> introduce a new condition into the query, asking for convertibles?)
> 
> 
> Maybe I should make one thing explicit: I am not thinking 
> about a Boolean point of view.  Maybe you consider a car to 
> be red or not red, and that's it.  For me, it's more or less 
> red.  (This is a somewhat silly example, but you could 
> include color similarities, then it would make more sense.  
> And for other conditions such as price<10,000 it's clear that 
> the condition can be fulfilled to different degrees.)
> 
> So for each of the basic query conditions (let's consider 
> color=red and price<10,000 as the basic conditions) there is 
> a degree of it being fulfilled, and these degrees are then combined.
> 
> Fuzzy logic aficionados say that any car belongs to the 
> result set in a lesser or higher degree, friends of 
> probabilities (I'm one of them) say that there is a lower or 
> higher probability of the car being relevant to the 
> information need of the user.  But as long as there is a 
> number of some kind that can be interpreted as the score or 
> degree or probability, then I'll be happy :-)
> 
> -- 
> A preposition is not a good thing to end a sentence with.
> 

Received on Tuesday, 1 April 2003 11:37:32 UTC