W3C home > Mailing lists > Public > public-esw-thes@w3.org > October 2009


From: Johan De Smedt <Johan.De-smedt@tenforce.com>
Date: Thu, 22 Oct 2009 09:16:23 +0200
To: Stella Dextre Clarke <stella@lukehouse.org>, Thomas Bandholtz <thomas.bandholtz@innoq.com>
CC: Antoine Isaac <aisaac@few.vu.nl>, SKOS <public-esw-thes@w3.org>
Message-ID: <B433CF38970EC14EA2B3A08075A5FA7B272D8F6075@tfvirt-prdexch.tenforce2.be>

Suggestion: There are three levels of organization.
- Concepts (SKOS talk)
- Labels
- Text processing

A significant part of the issues discussed related to what is on the label management level
and what is on the text processing level (thus needing a proper definition)

Language specific text processing and analysis (including inflection)
seems to me a specialized area for which global resource (language dictionalries)
like word-net can solve.
Stemmeng, also is in this area.
It seems to me costly if this would be managed in every thesaurus.

Label management can focus on standard terms and term decomposition as relevant within a 
thesaurus or taxonomy.  (equivalence relation, compound equivalence, acronym, 
short-name, qualifiers ...)

Indexing and search engines combining thesaurus and text processing should can use the label
management layer (of the thesaurus) to configure the relevant text processing.

Concept and label processing surely belong to the thesaurus/taxonomy/... management.
Text processing, I would suggest, is in the text processing engines.

- thanks for the UMTHES presentation - very instructive.
- would it be an idea to build on further SKOS extensions to have common schema for
  artefacts like equivalence relation and compound equivalence; or for specializing
  some xl:labelRelation ?

kr, Johan De Smedt. 
-----Original Message-----
From: public-esw-thes-request@w3.org [mailto:public-esw-thes-request@w3.org] On Behalf Of Stella Dextre Clarke
Sent: Wednesday, 21 October, 2009 21:31
To: Thomas Bandholtz
Cc: Antoine Isaac; SKOS
Subject: Re: UMTHES and SKOS-XL

Thomas Bandholtz wrote:

> Secondly, we need this stuff to support automated indexing of full text 
> documents. Machine need to be enabled to detect the Concepts behind this 
> weird mess of character strings that makes a document (more on this in 
> the ecoterm presentation).
Another interesting point. I sometimes hear people complain that 
ISO2788-compliant thesauri do not help enough with retrieval from full 
text of documents that have not been humanly indexed. This is hardly 
surprising, since they were designed to support retrieval of documents 
indexed with that same vocabulary. The same is true of BS 8723-2 and the 
forthcoming ISO 25964-1.

When people want to use a thesaurus for full text retrieval, I sometimes 
suggest they could improve the results by stripping the qualifiers off 
the non-preferred terms. But more could be done to enhance the results 
of that process, by including inflectional forms, term weighting, 
Boolean expressions, additional less reliable clue-words, etc, and of 
course dropping the idea of admitting the clue-words as non-preferred 
synonyms with  reciprocal relationships.

I sometimes wonder if a future revised version of BS 8723 or ISO 25964 
should include some recommendations to this effect. What do you think?


Stella Dextre Clarke
Information Consultant
Luke House, West Hendred, Wantage, OX12 8RR, UK
Tel: 01235-833-298
Fax: 01235-863-298
Received on Thursday, 22 October 2009 07:17:07 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 2 March 2016 13:32:12 UTC