W3C home > Mailing lists > Public > public-esw-thes@w3.org > October 2009


From: Stella Dextre Clarke <stella@lukehouse.org>
Date: Wed, 21 Oct 2009 20:31:17 +0100
Message-ID: <4ADF6185.1060309@lukehouse.org>
To: Thomas Bandholtz <thomas.bandholtz@innoq.com>
CC: Antoine Isaac <aisaac@few.vu.nl>, SKOS <public-esw-thes@w3.org>
Thomas Bandholtz wrote:

> Secondly, we need this stuff to support automated indexing of full text 
> documents. Machine need to be enabled to detect the Concepts behind this 
> weird mess of character strings that makes a document (more on this in 
> the ecoterm presentation).
Another interesting point. I sometimes hear people complain that 
ISO2788-compliant thesauri do not help enough with retrieval from full 
text of documents that have not been humanly indexed. This is hardly 
surprising, since they were designed to support retrieval of documents 
indexed with that same vocabulary. The same is true of BS 8723-2 and the 
forthcoming ISO 25964-1.

When people want to use a thesaurus for full text retrieval, I sometimes 
suggest they could improve the results by stripping the qualifiers off 
the non-preferred terms. But more could be done to enhance the results 
of that process, by including inflectional forms, term weighting, 
Boolean expressions, additional less reliable clue-words, etc, and of 
course dropping the idea of admitting the clue-words as non-preferred 
synonyms with  reciprocal relationships.

I sometimes wonder if a future revised version of BS 8723 or ISO 25964 
should include some recommendations to this effect. What do you think?


Stella Dextre Clarke
Information Consultant
Luke House, West Hendred, Wantage, OX12 8RR, UK
Tel: 01235-833-298
Fax: 01235-863-298
Received on Wednesday, 21 October 2009 19:31:47 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 2 March 2016 13:32:12 UTC