- From: Ray Larson <ray@SIMS.Berkeley.EDU>
- Date: Thu, 19 Feb 2004 16:08:58 -0800 (PST)
- To: ajk@mds.rmit.edu.au
- Cc: www-zig@w3.org
>>>>> "Alan" == Alan Kent <ajk@mds.rmit.edu.au> writes:
Alan> On Thu, Feb 19, 2004 at 11:35:53AM +0000, Robert Sanderson
Alan> wrote:
>> We return the stemmed term, eg 'happi' for happy, happily,
>> happiness.
This isn't completely true (and Rob knows better) what is returned
from scan is what was put into the index after normalization, so
if stemming was requested during normalization, stemmed terms are
returned, if not, they are not stemmed.
Alan> So returned 'term' values may be munged, but are used for
Alan> searching.
Alan> This implies you have to guarantee any output of your
Alan> stemmer can be fed back into the stemmer and have the same
Alan> value output again. Otherwise the term from the scan could
Alan> not be used for searching.
That is the case for our stemmer, a stemmed word submitted to the
stemmer returns the same word
Alan> In the case of soundex, this could be achieved by looking at
Alan> the term and saying "ooh, that looks like the output of the
Alan> soundex algorithm - I will just leave that alone".
Right, and I have seen some where the term in TermInfo is an incomprehensible
mangle of stuff, and displayTerm may or may not be filled in.
Alan> This is also consistent with what Ashley does - if it has
Alan> spaces, munge it. If it does not have spaces, maybe its a
Alan> scan term so don't do anything to it.
Alan> Thanks Alan
Cheers,
Ray Larson
Received on Thursday, 19 February 2004 19:09:02 UTC