Re: Terms and display terms in scan from Mike Taylor on 2004-02-25 (www-zig@w3.org from February 2004)

From: Mike Taylor <mike@indexdata.com>
Date: Wed, 25 Feb 2004 13:59:24 GMT
To: azaroth@liverpool.ac.uk
Cc: ajk@mds.rmit.edu.au, www-zig@w3.org
Message-Id: <200402251359.i1PDxOSx022071@localhost.localdomain>

> Date: Fri, 20 Feb 2004 12:51:56 +0000 (GMT)
> From: Robert Sanderson <azaroth@liverpool.ac.uk>
> 
> > I don't think a scan interface has any business exposing dirty
> > laundry such as the stemmed term "happi" to the poor, innocent
> > user.
> 
> There's very little choice, as one stem might be made up of several
> different words (unhappiness, happy, happily)

Where's the problem?  Pick one, and use that: a stemmed search will
find them all anyway, so any one representative of the equivalence
class is as good as any other.  If you like, you can provide:

	term = "happy" (arbitrarity chosen)
	displayTerm = "unhappiness, happy, happily"

> One of these could be selected at random for term, but then the
> termlist might not be sorted. (eg if the stem 'happi' came from
> 'unhappiness')

So sort it!

> If the user didn't want to scan using a stemming algorithm, then
> they shouldn't have asked for it! :)

Ah ...  The well-known and much admired Don't Do That Then defence.
Well, yes; but still, we should respect what the elements are
fundamentally _for_.

 _/|_	 _______________________________________________________________
/o ) \/  Mike Taylor  <mike@indexdata.com>  http://www.miketaylor.org.uk
)_v__/\  "In art criticism and literary criticism, it is normal to
	 come across long passages which are almost completely lacking
	 in meaning" -- George Orwell.

--
Listen to my wife's new CD of kids' music, _Child's Play_, at
	http://www.pipedreaming.org.uk/childsplay/

Received on Wednesday, 25 February 2004 09:00:31 UTC