- From: Johan De Smedt <Johan.De-smedt@tenforce.com>
- Date: Thu, 22 Oct 2009 09:16:23 +0200
- To: Stella Dextre Clarke <stella@lukehouse.org>, Thomas Bandholtz <thomas.bandholtz@innoq.com>
- CC: Antoine Isaac <aisaac@few.vu.nl>, SKOS <public-esw-thes@w3.org>
Hi, Suggestion: There are three levels of organization. - Concepts (SKOS talk) - Labels - Text processing A significant part of the issues discussed related to what is on the label management level and what is on the text processing level (thus needing a proper definition) Language specific text processing and analysis (including inflection) seems to me a specialized area for which global resource (language dictionalries) like word-net can solve. Stemmeng, also is in this area. It seems to me costly if this would be managed in every thesaurus. Label management can focus on standard terms and term decomposition as relevant within a thesaurus or taxonomy. (equivalence relation, compound equivalence, acronym, short-name, qualifiers ...) Indexing and search engines combining thesaurus and text processing should can use the label management layer (of the thesaurus) to configure the relevant text processing. Concept and label processing surely belong to the thesaurus/taxonomy/... management. Text processing, I would suggest, is in the text processing engines. PS: - thanks for the UMTHES presentation - very instructive. - would it be an idea to build on further SKOS extensions to have common schema for artefacts like equivalence relation and compound equivalence; or for specializing some xl:labelRelation ? kr, Johan De Smedt. =================== -----Original Message----- From: public-esw-thes-request@w3.org [mailto:public-esw-thes-request@w3.org] On Behalf Of Stella Dextre Clarke Sent: Wednesday, 21 October, 2009 21:31 To: Thomas Bandholtz Cc: Antoine Isaac; SKOS Subject: Re: UMTHES and SKOS-XL Thomas Bandholtz wrote: > Secondly, we need this stuff to support automated indexing of full text > documents. Machine need to be enabled to detect the Concepts behind this > weird mess of character strings that makes a document (more on this in > the ecoterm presentation). Another interesting point. I sometimes hear people complain that ISO2788-compliant thesauri do not help enough with retrieval from full text of documents that have not been humanly indexed. This is hardly surprising, since they were designed to support retrieval of documents indexed with that same vocabulary. The same is true of BS 8723-2 and the forthcoming ISO 25964-1. When people want to use a thesaurus for full text retrieval, I sometimes suggest they could improve the results by stripping the qualifiers off the non-preferred terms. But more could be done to enhance the results of that process, by including inflectional forms, term weighting, Boolean expressions, additional less reliable clue-words, etc, and of course dropping the idea of admitting the clue-words as non-preferred synonyms with reciprocal relationships. I sometimes wonder if a future revised version of BS 8723 or ISO 25964 should include some recommendations to this effect. What do you think? Stella ***************************************************** Stella Dextre Clarke Information Consultant Luke House, West Hendred, Wantage, OX12 8RR, UK Tel: 01235-833-298 Fax: 01235-863-298 stella@lukehouse.org *****************************************************
Received on Thursday, 22 October 2009 07:17:07 UTC