W3C home > Mailing lists > Public > public-esw-thes@w3.org > October 2009


From: Richard Light <richard@light.demon.co.uk>
Date: Thu, 22 Oct 2009 07:04:59 +0100
Message-ID: <zWXkF8fLY$3KFwdI@light.demon.co.uk>
To: Stella Dextre Clarke <stella@lukehouse.org>
Cc: Thomas Bandholtz <thomas.bandholtz@innoq.com>, Antoine Isaac <aisaac@few.vu.nl>, SKOS <public-esw-thes@w3.org>
In message <4ADF6185.1060309@lukehouse.org>, Stella Dextre Clarke 
<stella@lukehouse.org> writes
>Thomas Bandholtz wrote:
>> Secondly, we need this stuff to support automated indexing of full 
>>text documents. Machine need to be enabled to detect the Concepts 
>>behind this  weird mess of character strings that makes a document 
>>(more on this in  the ecoterm presentation).
>Another interesting point. I sometimes hear people complain that 
>ISO2788-compliant thesauri do not help enough with retrieval from full 
>text of documents that have not been humanly indexed. This is hardly 
>surprising, since they were designed to support retrieval of documents 
>indexed with that same vocabulary. The same is true of BS 8723-2 and 
>the forthcoming ISO 25964-1.
>When people want to use a thesaurus for full text retrieval, I 
>sometimes suggest they could improve the results by stripping the 
>qualifiers off the non-preferred terms. But more could be done to 
>enhance the results of that process, by including inflectional forms, 
>term weighting, Boolean expressions, additional less reliable 
>clue-words, etc, and of course dropping the idea of admitting the 
>clue-words as non-preferred synonyms with  reciprocal relationships.
>I sometimes wonder if a future revised version of BS 8723 or ISO 25964 
>should include some recommendations to this effect. What do you think?

I would say not.  "Machines detecting concepts" strikes me as an 
unachievable goal, certainly with our current capabilities.  "Machines 
detecting the presence of words which are also terms in a thesaurus" is 
achievable, but it _isn't_ the same thing.

Richard Light
Received on Thursday, 22 October 2009 06:05:39 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 2 March 2016 13:32:12 UTC