W3C home > Mailing lists > Public > public-esw-thes@w3.org > November 2004

RE: vision for controlled vocabulary use and management

From: Miles, AJ (Alistair) <A.J.Miles@rl.ac.uk>
Date: Fri, 5 Nov 2004 16:08:21 -0000
Message-ID: <350DC7048372D31197F200902773DF4C05E50CF7@exchange11.rl.ac.uk>
To: 'Bernard Vatant' <bernard.vatant@mondeca.com>, Ron Davies <ron@rondavies.be>, public-esw-thes@w3.org

Hi guys,

Bernard's approach mirrors something we wrote up in the SWAD-E project on
dealing with multilingual thesauri - see:

http://www.w3.org/2001/sw/Europe/reports/thes/8.3/

... for general idea of 'multilingual labelling' vs. 'inter-lingual mapping'
approaches, although don't look too closely at code examples as they use
deprecated SKOS constructs and far too many blank nodes (they were written a
while ago).

Al.

---
Alistair Miles
Research Associate
CCLRC - Rutherford Appleton Laboratory
Building R1 Room 1.60
Fermi Avenue
Chilton
Didcot
Oxfordshire OX11 0QX
United Kingdom
Email:        a.j.miles@rl.ac.uk
Tel: +44 (0)1235 445440



> -----Original Message-----
> From: public-esw-thes-request@w3.org
> [mailto:public-esw-thes-request@w3.org]On Behalf Of Bernard Vatant
> Sent: 05 November 2004 14:18
> To: Ron Davies; public-esw-thes@w3.org
> Subject: RE: vision for controlled vocabulary use and management
> 
> 
> 
> 
> Ron
> 
> Addressing more or less what you write below:
> 
> "If you have look at any multilingual thesaurus, you soon run 
> into situations where the
> underlying conceptual structures appear to be different in 
> two different languages because
> the words that they use are not coherent. Is one conceptual 
> structure right and the other
> wrong? How do we tell? Most often, you have to adopt one or 
> other of the conceptual
> structures, and then try desperately to make the terms from 
> the other language fit and
> hope that your poor users are not utterly confused. The 
> thesaurus standards are full of
> examples of rather unsatisfactory ways you can try to get 
> around this problem."
> 
> I've made a suggestion a week ago on SWBPD Vocbulary 
> Management Task Force list
> http://lists.w3.org/Archives/Public/public-swbp-wg/2004Oct/0185.html
> 
> In this post I try to figure out the various ways to tackle 
> such concept identification
> issues in multilingual environments, a question sadly 
> overlooked in most of Semantic Web
> venues ... as a matter of fact, I got no answer so far overthere :((
> 
> Bernard
> 
> **************************************************************
> ********************
> 
> Bernard Vatant
> Senior Consultant
> Knowledge Engineering
> bernard.vatant@mondeca.com
> 
> "Making Sense of Content" :  http://www.mondeca.com
> "Everything is a Subject" :  http://universimmedia.blogspot.com
> 
> **************************************************************
> ********************
> -----Message d'origine-----
> De : public-esw-thes-request@w3.org 
> [mailto:public-esw-thes-request@w3.org]De la part de
> Ron Davies
> Envoye : lundi 1 novembre 2004 20:26
> A : public-esw-thes@w3.org
> Objet : Re: vision for controlled vocabulary use and management
> 
> 
> Alistair,
> 
> I had started to prepare a response to your post a few weeks 
> ago, but you had raised
> indirectly so many issues that I got discouraged with trying 
> to deal with them all.
> Dagobert's response has given me new courage to try to 
> address some of these issues,
> 
> 
> (1) Concept-Oriented Design and Construction
> 
> I'm not really sure what you mean by concept-oriented design 
> and construction, except that
> you want applications to use concept identlfiers rather than 
> terms from natural languages
> to identify a concept.
> 
> If you mean something more--  perhaps some notion of Platonic 
> concepts that exist
> independently of language?--  then there are certainly some 
> thorny philosophical issues
> here. I am much less sanguin than Dagobert is about the ease 
> with which we can separate
> concepts from language, at least in the "soft" social 
> sciences. If you have look at any
> multilingual thesaurus, you soon run into situations where 
> the underlying conceptual
> structures appear to be different in two different languages 
> because the words that they
> use are not coherent. Is one conceptual structure right and 
> the other wrong? How do we
> tell? Most often, you have to adopt one or other of the 
> conceptual structures, and then
> try desperately to make the terms from the other language fit 
> and hope that your poor
> users are not utterly confused. The thesaurus standards are 
> full of examples of rather
> unsatisfactory ways you can try to get around this problem.
> 
> If you don't mean this, but you simply want applications to 
> use concept identifiers, I'm
> not sure this is a major issue. Whether in a particular 
> system a concept identifier has
> been entered into an indexing record or an alphabetic string 
> representing a preferred term
> label really doesn't matter very much. It isn't more 
> "concept-oriented" to rely on an
> identifier code--  a code is simply an identifier in another 
> (indexing) language.
> Identifier codes have certainly been used in indexing 
> applications, particularly in
> environments where synonym rings are required. (I can't 
> remember the name of the system,
> but at an architectural information centre in Washington 
> twelve or fifteen years ago I saw
> such a system, developed I think with the participation of 
> the Getty Art History
> Information Program. All authority control was handled 
> through relational links).
> 
> Whether to use identifier codes in an indexing/retrieval 
> application or not depends on
> very practical considerations in the design of the 
> indexing/retrieval system (e.g. what
> kind of data you want to expose to others, and where further 
> processing of the data is
> done). For example, you can expose data with a code (an 
> artificial language), and expect
> the client to do a lookup to substitute a preferred term 
> label in some natural language,
> or you can provide the preferred term label itself and expect 
> the client to do the
> translation into other natural languages, or you can expose 
> data with all of the preferred
> terms in the various natural languages. It's true that using 
> a code makes a few internal
> changes easier to implement (spelling changes, or swaps 
> between preferred and
> non-preferred terms) but these are only a small percentage of 
> the changes that take place.
> And those changes are easy to implement even in a system that 
> uses natural language terms
> to identify the concept as long as the system supports a 
> global change operation.
> 
> 
> (3) Concept-Oriented Maintenance & Management
> <snip>
> 
> This means that, if an authority wishes to
> significantly refactor/reorganise/redefine some of its 
> concepts, this is
> best done by defining and publishing some new concepts and new concept
> identifiers.
> 
> Again, as Dagobert points out, this is far from simple. The 
> devil here is in the details.
> What is a significant change? How do we wish to update the 
> indexing data?
> 
> For example, we might have
> - swapping a non-preferred term ("data sticks") with a 
> preferred term ("USB keys"). The
> concept is the same, it is simply the label has altered. In 
> other words, the meaning of
> the concept hasn't changed.
> - swapping a spelling or dialectal variant ("labor") with a 
> preferred term ("labour").
> Again the meaning hasn't changed.
> - swapping an abbreviation ("AIDS") with a full form 
> ("Auto-immune deficiency syndrome").
> Ditto.
> 
> The above all seem to be semantically neutral, i.e. the 
> concept hasn't changed. However
> consider:
> 
> - adding a scope note ("Use this only for X. For other cases, 
> use the new term Y".) The
> meaning has changed, but the expression of the concept, i.e. 
> the preferred term, hasn't.
> - adding a history note ("Used up until 2004. After 2004, use 
> W or Z").
> - two concepts are merged into a single concept
> - a concept is split into two concepts
> - an non-preferred term is removed
> - a non-preferred term is removed because it's become its own 
> term, i.e. there really is a
> new _concept_. "Tropical products" loses the UF "Bananas", 
> because "Bananas" is now a new
> concept.
> - adding a new BT or a new NT or a new RT or deleting one or 
> more of these.
> 
> Which of these represents a new concept? Or do they all do? 
> How are we to update the
> indexing data, e.g. in the case of a split?
> 
> 
> Replacement relationships may be then defined between the old
> concepts and the new, which would support perfect 
> interoperability between
> systems employing old and new concept sets, and would also 
> support automated
> updating of indexing metadata.
> 
> Traditionally thesauri have usually used versioning to 
> control differences in conceptual
> structure. In other words, a thesaurus is published in one 
> version, changes are made
> "offline" and then a new version is produced and published. 
> This has the advantage of
> organizing work processes, allowing for users to get familiar 
> with a particular conceptual
> structure, and permitting developers to check each version 
> for conceptual coherence. Links
> between concepts in one version and another version (the 
> electronic version of the
> traditional Additions and Changes Lists) could be indicated 
> by mapping from one to another
> just as we map from one thesaurus to another. The mapping 
> could then be applied to update
> the indexing applications, where the update can be done 
> automatically, i.e. where it is
> simple. (It can't in all cases, as anyone who has done this 
> kind of work can confirm.)
> 
> One of the reasons I mention this is that trying to carry in 
> a concept record _all_ the
> past history via replacement relationships seems to me to 
> complicate enormously the
> structure as well as leading to semantic difficulties. And a 
> complicated structure, which
> is difficult for people to understand (even if they aren't 
> often asked to do so) will turn
> people off using SKOS (which is meant to be Simple). Whereas 
> publishing this information
> as a mapping (for which there is already a structure defined) 
> is much simpler for mere
> humans to understand.
> 
> Anyway, I hope these few thoughts help. These are difficult 
> issues to try to tackle in an
> online discussion.
> 
> Ron
> 
> -----------------------------------------------
> Ron Davies
> Information and documentation systems consultant
> Av. Baden-Powell 1  Bte 2, 1200 Brussels, Belgium       
> Email:  ron@rondavies.be
> Tel:    +32 (0)2 770 33 51
> GSM:    +32 (0)484 502 393
> 
> 
> 
Received on Friday, 5 November 2004 16:09:04 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:38:52 GMT