Re: Terms and display terms in scan

> Date: Fri, 20 Feb 2004 10:24:44 +1100
> From: Alan Kent <ajk@mds.rmit.edu.au>
> 
> On Thu, Feb 19, 2004 at 11:35:53AM +0000, Robert Sanderson wrote:
> > We return the stemmed term, eg 'happi' for happy, happily, happiness.

This seems terribly wrong to me.

> So returned 'term' values may be munged, but are used for searching.

I agree that it should be a non-negotiable property of Scan that what
comes back in the "term" can be re-submitted as a search and get
sensible results.

> This implies you have to guarantee any output of your stemmer can
> be fed back into the stemmer and have the same value output again.
> Otherwise the term from the scan could not be used for searching.
> 
> In the case of soundex, this could be achieved by looking at the
> term and saying "ooh, that looks like the output of the soundex
> algorithm - I will just leave that alone".

Such ad-hoc hackery (ad-hackery?) surely can't be right.

I don't think a scan interface has any business exposing dirty laundry
such as the stemmed term "happi" to the poor, innocent user.

> This is also consistent with what Ashley does - if it has spaces,
> munge it. If it does not have spaces, maybe its a scan term so don't
> do anything to it.

Ugh.

 _/|_	 _______________________________________________________________
/o ) \/  Mike Taylor  <mike@indexdata.com>  http://www.miketaylor.org.uk
)_v__/\  "Personally, I don't think its sexual dimorphism.  I'm all
	 for it, but not in this case" - Tracy L. Ford.

--
Listen to my wife's new CD of kids' music, _Child's Play_, at
	http://www.pipedreaming.org.uk/childsplay/

Received on Friday, 20 February 2004 07:36:49 UTC