W3C home > Mailing lists > Public > public-esw-thes@w3.org > November 2004

FW: FW: vision for controlled vocabulary use and management

From: Miles, AJ (Alistair) <A.J.Miles@rl.ac.uk>
Date: Mon, 1 Nov 2004 15:05:26 -0000
Message-ID: <350DC7048372D31197F200902773DF4C05DE3362@exchange11.rl.ac.uk>
To: public-esw-thes@w3.org

Fwded from Dagobert ...

-----Original Message-----
From: Dagobert Soergel [mailto:dsoergel@umd.edu]
Sent: 23 October 2004 04:01
To: Miles, AJ (Alistair) 
Subject: Re: FW: vision for controlled vocabulary use and management

(1) I agree with you entirely.  Information retrieval (or resource 
discovery) is to a large extent about enabling the user to improve his or 
her understanding of subjects, topics, issues, contexts that are a defined 
in terms of concepts, it is to a large extent to help users create 
meaning.  Consequently, thesauri and other knowledge organization systems 
(KOS) need to be based on concepts and furthermore present these concepts 
in a meaningful arrangement.  The semantic Web is by its very definition 
about concepts, it needs well-structured, well-defined concept systems 
called ontologies.  Ideally, concept-based KOS will serve both for human 
information processing and for machine information processing.  This 
thought is, of course, not new; what is significant that it seems adapted 
and acted on more broadly.

On the other hand, human thought and language are very complex, and a pure 
concept-based approach may not capture all the nuances of 
language.  Therefore, the model outlined in 
http://jodi.ecs.soton.ac.uk/Articles/v04/i04/Soergel/ is primarily 
concept-based and links terms to concepts but also allows for treating 
terms as entities in their own right that can enter into relationships with 
other terms; these relationships complement the concept-concept and 
concept-term relationships.  Finally, at the level of concept expression 
for human readers and programs for language processing we have strings, a 
term being expressed by one or more strings.  Since concepts and language 
are inextricably linked, we need a structure of sufficient complexity.

(2) Much search is based on free-text searching (or on automatically 
detectable image features in image retrieval; in that environment, limited 
concept-based search can be accomplished through query expansion supported 
by a KOS.  In high-payoff situations, sophisticated indexing pays of.  Such 
indexing, whether by people, computer programs (yes, computer programs can 
do concept-oriented indexing, particularly when supported by a rich KOS as 
a knowledge base), or human-computer collaboration, should always be 
concept-based.  There are many advantages to record concepts by  concept 
identifiers, preferable URIs, in the indexing record: The terms displayed 
to the user can be adapted to the user's language and to current 
usage.  When the user clicks on the surface term concept, the system can 
use the underlying URI to quickly link to the specific record in a KOS 
where more information on the concept can be found.  But again, users also 
need to be supported in understanding natural language terms, particularly 
if they read in a foreign language, and a good system for the retrieval and 
reading of documents would have a function that accesses multiple KOS on 
the Web to find such explanations.

(3) is desirable in principle but a bit more problematic.  In a system 
where concepts are defined axiomatically (as in a system that supports 
automatic reasoning) a change in definition certainly creates a new concept 
that should get a new URI.  The new concept should be properly related to 
the old concept, and the old concept should be marked as no longer used in 
the system at hand (which would not stop some other system from using 
it).  But consider he case of concepts defined for the purpose of gathering 
and reporting statistics, such as the definition of a branch of industry. 
or the Consumer Price Index or what constitutes poverty.  A minor change in 
wording may be introduced to clarify, not change, the meaning.  Do we have 
a new concept?  Or the definition could be changed more or less 
drastically, definitely creating a new concept.  The old concept would 
still be needed to understand historical statistics.  This clearly requires 
fairly elaborate version control.

So, in sum, a concept-oriented approach will go a long way to improve 
information retrieval and processing by people and machines.


At 10/8/2004 11:14 AM, you wrote:
>Hi Dagobert,
>Would you be able to comment on this, for the public-esw-thes@w3.org
>list? ...
>-----Original Message-----
>From: public-esw-thes-request@w3.org
>[mailto:public-esw-thes-request@w3.org]On Behalf Of Miles, AJ (Alistair)
>Sent: 08 October 2004 12:46
>To: 'public-esw-thes@w3.org'
>Subject: vision for controlled vocabulary use and management
>Hi all,
>I thought I'd try to put down in words where I have assumed controlled
>vocabulary development and use is (or ought to be :) going.  Because SKOS
>Core is forward looking, and has been designed with a lot of
>in mind, I wanted to check that some simple elements of this vision make
>sense to everyone else.
>(1) Concept-Oriented Design and Construction
>The design and construction of controlled vocabularies, thesauri etc. will
>become *concept-oriented*.  This means that concepts are identified
>explicitly, and the meaning of a concept is understood to be taken from the
>combination of its labels, notes etc.
>(2) Concept-Oriented Indexing
>The use of controlled vocabularies, thesauri etc. for (subject-based)
>indexing will become concept oriented.  This means that the index values in
>record metadata will be *concept-identifiers* and not terms.  In turn,
>indexing applications will hide these identifiers from the indexer ... so
>indexer will interact with a set of concepts via their labels and notes, as
>part of selecting the appropriate concept, and the insertion of
>concept-identifiers into metadata is performed by the application.
>(3) Concept-Oriented Maintenance & Management
>Once a concept identifier has been published, it is in the interest of the
>publishing authority to avoid altering the meaning of the concept
>with that identifier.  If the meaning is significantly altered over time,
>the identifier will be applied inconsistently in indexing metadata, and its
>utility will be reduced.  This means that, if an authority wishes to
>significantly refactor/reorganise/redefine some of its concepts, this is
>best done by defining and publishing some new concepts and new concept
>identifiers.  Replacement relationships may be then defined between the old
>concepts and the new, which would support perfect interoperability between
>systems employing old and new concept sets, and would also support
>updating of indexing metadata.
>These are the kinds of assumption I've been working on ... so if these look
>wrong to you, please tell me.  Also, Stella's last email [1] highlighted
>difference between this vision and current paradigm and practise within the
>thesaurus user community, wrt change management.  Ideally, I would like
>Core to be where the thesaurus user community will arrive in perhaps a
>couple of years, so that it fits the requirements.  However, this may be at
>least in part unrealistic.  Bridging the space between the current users of
>controlled vocabularies and the framework of the semantic web is what I see
>as the central goal of the SKOS work ... which may require a little more
>meeting in the middle ... ?
>Anyway, food for thought.
>[1] http://lists.w3.org/Archives/Public/public-esw-thes/2004Oct/0048.html
>Alistair Miles
>Research Associate
>CCLRC - Rutherford Appleton Laboratory
>Building R1 Room 1.60
>Fermi Avenue
>Oxfordshire OX11 0QX
>United Kingdom
>Email:        a.j.miles@rl.ac.uk
>Tel: +44 (0)1235 445440

Dagobert Soergel
College of Information Studies
University of Maryland
4105 Hornbake Library
College Park, MD 20742-4345
Office: 301-405-2037     Home:  703-823-2840        Mobile: 703-585-2840
OFax:   301-314-9145        HFax: 703-823-6427
dsoergel@umd.edu     www.dsoergel.com 
Received on Monday, 1 November 2004 15:06:00 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 2 March 2016 13:32:04 UTC