- From: Ray Larson <ray@SIMS.Berkeley.EDU>
- Date: Thu, 19 Feb 2004 16:08:58 -0800 (PST)
- To: ajk@mds.rmit.edu.au
- Cc: www-zig@w3.org
>>>>> "Alan" == Alan Kent <ajk@mds.rmit.edu.au> writes: Alan> On Thu, Feb 19, 2004 at 11:35:53AM +0000, Robert Sanderson Alan> wrote: >> We return the stemmed term, eg 'happi' for happy, happily, >> happiness. This isn't completely true (and Rob knows better) what is returned from scan is what was put into the index after normalization, so if stemming was requested during normalization, stemmed terms are returned, if not, they are not stemmed. Alan> So returned 'term' values may be munged, but are used for Alan> searching. Alan> This implies you have to guarantee any output of your Alan> stemmer can be fed back into the stemmer and have the same Alan> value output again. Otherwise the term from the scan could Alan> not be used for searching. That is the case for our stemmer, a stemmed word submitted to the stemmer returns the same word Alan> In the case of soundex, this could be achieved by looking at Alan> the term and saying "ooh, that looks like the output of the Alan> soundex algorithm - I will just leave that alone". Right, and I have seen some where the term in TermInfo is an incomprehensible mangle of stuff, and displayTerm may or may not be filled in. Alan> This is also consistent with what Ashley does - if it has Alan> spaces, munge it. If it does not have spaces, maybe its a Alan> scan term so don't do anything to it. Alan> Thanks Alan Cheers, Ray Larson
Received on Thursday, 19 February 2004 19:09:02 UTC