- From: Kay, Michael <Michael.Kay@softwareag.com>
- Date: Tue, 1 Apr 2003 18:37:19 +0200
- To: kai.grossjohann@uni-duisburg.de, Jonathan Robie <jonathan.robie@datadirect-technologies.com>
- Cc: public-qt-comments@w3.org
I think this is all excellent stuff, and I would love to do this, but please remember we are in the standards business and not the research business! It's a bad mistake to standardize technology while it is still in the research phase. Michael Kay Software AG > -----Original Message----- > From: kai.grossjohann@uni-duisburg.de > [mailto:kai.grossjohann@uni-duisburg.de] > Sent: 01 April 2003 12:31 > To: Jonathan Robie > Cc: public-qt-comments@w3.org > Subject: Re: FTS comments > > > > Jonathan Robie <jonathan.robie@datadirect-technologies.com> writes: > > > This has come up a number of times. Here's my opinion: the > semantics > > of "close match" for a structure can be highly > application-dependent, > > and it's not clear to me whether there is an approach to > approximate > > structural match that will work at all well across a variety of > > structures from different application areas. For instance, in this > > case, is a car a close match if the price is too high, or > is that an > > absolute criterion for the user? To get useful results, it > may be the > > case that the user really does need to let us know, in some > way, which > > criteria to relax. There may be literature or products that tell us > > that we actually know how to do this well in the general > case - do you > > know of any that we might want to look at? > > The situation is quite the same as a full-text query where > the user has entered, say, ten words. Which of the words is > most important? > > It's the same problem. > > One way to approach it is to provide a weighted-sum operator > in the query language (that's what we did for XIRQL). Then > users can say 0.3 * color=red + 0.7 * price<10,000; or they > can swap the coefficients. (This example was not intended to > conform to any particular syntax, I hope you can understand > what I meant.) > > Another way to approach it is to use what IR people would > call idf: suppose that there is only one red car in the whole > collection. Then maybe it should come out first even if it > violates the price condition. (One red car in the whole > collection ==> low "document frequency" ==> high "inverse > document frequency".) > > Still another way to approach it is to use relevance > feedback: using the weighted sum operation, we guess some > coefficients and present results to the user. The user tells > us which cards he likes. We use that information to tune the > coefficients. > > (Relevance feedback for fact conditions is still an open > research issue, AFAIK. Maybe if the user consistently marks > convertibles as being highly interesting, then we should > introduce a new condition into the query, asking for convertibles?) > > > Maybe I should make one thing explicit: I am not thinking > about a Boolean point of view. Maybe you consider a car to > be red or not red, and that's it. For me, it's more or less > red. (This is a somewhat silly example, but you could > include color similarities, then it would make more sense. > And for other conditions such as price<10,000 it's clear that > the condition can be fulfilled to different degrees.) > > So for each of the basic query conditions (let's consider > color=red and price<10,000 as the basic conditions) there is > a degree of it being fulfilled, and these degrees are then combined. > > Fuzzy logic aficionados say that any car belongs to the > result set in a lesser or higher degree, friends of > probabilities (I'm one of them) say that there is a lower or > higher probability of the car being relevant to the > information need of the user. But as long as there is a > number of some kind that can be interpreted as the score or > degree or probability, then I'll be happy :-) > > -- > A preposition is not a good thing to end a sentence with. >
Received on Tuesday, 1 April 2003 11:37:32 UTC