- From: Richard Light <richard@light.demon.co.uk>
- Date: Thu, 22 Oct 2009 07:04:59 +0100
- To: Stella Dextre Clarke <stella@lukehouse.org>
- Cc: Thomas Bandholtz <thomas.bandholtz@innoq.com>, Antoine Isaac <aisaac@few.vu.nl>, SKOS <public-esw-thes@w3.org>
In message <4ADF6185.1060309@lukehouse.org>, Stella Dextre Clarke <stella@lukehouse.org> writes >Thomas Bandholtz wrote: > >> Secondly, we need this stuff to support automated indexing of full >>text documents. Machine need to be enabled to detect the Concepts >>behind this weird mess of character strings that makes a document >>(more on this in the ecoterm presentation). >Another interesting point. I sometimes hear people complain that >ISO2788-compliant thesauri do not help enough with retrieval from full >text of documents that have not been humanly indexed. This is hardly >surprising, since they were designed to support retrieval of documents >indexed with that same vocabulary. The same is true of BS 8723-2 and >the forthcoming ISO 25964-1. > >When people want to use a thesaurus for full text retrieval, I >sometimes suggest they could improve the results by stripping the >qualifiers off the non-preferred terms. But more could be done to >enhance the results of that process, by including inflectional forms, >term weighting, Boolean expressions, additional less reliable >clue-words, etc, and of course dropping the idea of admitting the >clue-words as non-preferred synonyms with reciprocal relationships. > >I sometimes wonder if a future revised version of BS 8723 or ISO 25964 >should include some recommendations to this effect. What do you think? I would say not. "Machines detecting concepts" strikes me as an unachievable goal, certainly with our current capabilities. "Machines detecting the presence of words which are also terms in a thesaurus" is achievable, but it _isn't_ the same thing. Richard -- Richard Light
Received on Thursday, 22 October 2009 06:05:39 UTC