- From: Miles, AJ (Alistair) <A.J.Miles@rl.ac.uk>
- Date: Fri, 5 Nov 2004 16:08:21 -0000
- To: 'Bernard Vatant' <bernard.vatant@mondeca.com>, Ron Davies <ron@rondavies.be>, public-esw-thes@w3.org
Hi guys, Bernard's approach mirrors something we wrote up in the SWAD-E project on dealing with multilingual thesauri - see: http://www.w3.org/2001/sw/Europe/reports/thes/8.3/ ... for general idea of 'multilingual labelling' vs. 'inter-lingual mapping' approaches, although don't look too closely at code examples as they use deprecated SKOS constructs and far too many blank nodes (they were written a while ago). Al. --- Alistair Miles Research Associate CCLRC - Rutherford Appleton Laboratory Building R1 Room 1.60 Fermi Avenue Chilton Didcot Oxfordshire OX11 0QX United Kingdom Email: a.j.miles@rl.ac.uk Tel: +44 (0)1235 445440 > -----Original Message----- > From: public-esw-thes-request@w3.org > [mailto:public-esw-thes-request@w3.org]On Behalf Of Bernard Vatant > Sent: 05 November 2004 14:18 > To: Ron Davies; public-esw-thes@w3.org > Subject: RE: vision for controlled vocabulary use and management > > > > > Ron > > Addressing more or less what you write below: > > "If you have look at any multilingual thesaurus, you soon run > into situations where the > underlying conceptual structures appear to be different in > two different languages because > the words that they use are not coherent. Is one conceptual > structure right and the other > wrong? How do we tell? Most often, you have to adopt one or > other of the conceptual > structures, and then try desperately to make the terms from > the other language fit and > hope that your poor users are not utterly confused. The > thesaurus standards are full of > examples of rather unsatisfactory ways you can try to get > around this problem." > > I've made a suggestion a week ago on SWBPD Vocbulary > Management Task Force list > http://lists.w3.org/Archives/Public/public-swbp-wg/2004Oct/0185.html > > In this post I try to figure out the various ways to tackle > such concept identification > issues in multilingual environments, a question sadly > overlooked in most of Semantic Web > venues ... as a matter of fact, I got no answer so far overthere :(( > > Bernard > > ************************************************************** > ******************** > > Bernard Vatant > Senior Consultant > Knowledge Engineering > bernard.vatant@mondeca.com > > "Making Sense of Content" : http://www.mondeca.com > "Everything is a Subject" : http://universimmedia.blogspot.com > > ************************************************************** > ******************** > -----Message d'origine----- > De : public-esw-thes-request@w3.org > [mailto:public-esw-thes-request@w3.org]De la part de > Ron Davies > Envoye : lundi 1 novembre 2004 20:26 > A : public-esw-thes@w3.org > Objet : Re: vision for controlled vocabulary use and management > > > Alistair, > > I had started to prepare a response to your post a few weeks > ago, but you had raised > indirectly so many issues that I got discouraged with trying > to deal with them all. > Dagobert's response has given me new courage to try to > address some of these issues, > > > (1) Concept-Oriented Design and Construction > > I'm not really sure what you mean by concept-oriented design > and construction, except that > you want applications to use concept identlfiers rather than > terms from natural languages > to identify a concept. > > If you mean something more-- perhaps some notion of Platonic > concepts that exist > independently of language?-- then there are certainly some > thorny philosophical issues > here. I am much less sanguin than Dagobert is about the ease > with which we can separate > concepts from language, at least in the "soft" social > sciences. If you have look at any > multilingual thesaurus, you soon run into situations where > the underlying conceptual > structures appear to be different in two different languages > because the words that they > use are not coherent. Is one conceptual structure right and > the other wrong? How do we > tell? Most often, you have to adopt one or other of the > conceptual structures, and then > try desperately to make the terms from the other language fit > and hope that your poor > users are not utterly confused. The thesaurus standards are > full of examples of rather > unsatisfactory ways you can try to get around this problem. > > If you don't mean this, but you simply want applications to > use concept identifiers, I'm > not sure this is a major issue. Whether in a particular > system a concept identifier has > been entered into an indexing record or an alphabetic string > representing a preferred term > label really doesn't matter very much. It isn't more > "concept-oriented" to rely on an > identifier code-- a code is simply an identifier in another > (indexing) language. > Identifier codes have certainly been used in indexing > applications, particularly in > environments where synonym rings are required. (I can't > remember the name of the system, > but at an architectural information centre in Washington > twelve or fifteen years ago I saw > such a system, developed I think with the participation of > the Getty Art History > Information Program. All authority control was handled > through relational links). > > Whether to use identifier codes in an indexing/retrieval > application or not depends on > very practical considerations in the design of the > indexing/retrieval system (e.g. what > kind of data you want to expose to others, and where further > processing of the data is > done). For example, you can expose data with a code (an > artificial language), and expect > the client to do a lookup to substitute a preferred term > label in some natural language, > or you can provide the preferred term label itself and expect > the client to do the > translation into other natural languages, or you can expose > data with all of the preferred > terms in the various natural languages. It's true that using > a code makes a few internal > changes easier to implement (spelling changes, or swaps > between preferred and > non-preferred terms) but these are only a small percentage of > the changes that take place. > And those changes are easy to implement even in a system that > uses natural language terms > to identify the concept as long as the system supports a > global change operation. > > > (3) Concept-Oriented Maintenance & Management > <snip> > > This means that, if an authority wishes to > significantly refactor/reorganise/redefine some of its > concepts, this is > best done by defining and publishing some new concepts and new concept > identifiers. > > Again, as Dagobert points out, this is far from simple. The > devil here is in the details. > What is a significant change? How do we wish to update the > indexing data? > > For example, we might have > - swapping a non-preferred term ("data sticks") with a > preferred term ("USB keys"). The > concept is the same, it is simply the label has altered. In > other words, the meaning of > the concept hasn't changed. > - swapping a spelling or dialectal variant ("labor") with a > preferred term ("labour"). > Again the meaning hasn't changed. > - swapping an abbreviation ("AIDS") with a full form > ("Auto-immune deficiency syndrome"). > Ditto. > > The above all seem to be semantically neutral, i.e. the > concept hasn't changed. However > consider: > > - adding a scope note ("Use this only for X. For other cases, > use the new term Y".) The > meaning has changed, but the expression of the concept, i.e. > the preferred term, hasn't. > - adding a history note ("Used up until 2004. After 2004, use > W or Z"). > - two concepts are merged into a single concept > - a concept is split into two concepts > - an non-preferred term is removed > - a non-preferred term is removed because it's become its own > term, i.e. there really is a > new _concept_. "Tropical products" loses the UF "Bananas", > because "Bananas" is now a new > concept. > - adding a new BT or a new NT or a new RT or deleting one or > more of these. > > Which of these represents a new concept? Or do they all do? > How are we to update the > indexing data, e.g. in the case of a split? > > > Replacement relationships may be then defined between the old > concepts and the new, which would support perfect > interoperability between > systems employing old and new concept sets, and would also > support automated > updating of indexing metadata. > > Traditionally thesauri have usually used versioning to > control differences in conceptual > structure. In other words, a thesaurus is published in one > version, changes are made > "offline" and then a new version is produced and published. > This has the advantage of > organizing work processes, allowing for users to get familiar > with a particular conceptual > structure, and permitting developers to check each version > for conceptual coherence. Links > between concepts in one version and another version (the > electronic version of the > traditional Additions and Changes Lists) could be indicated > by mapping from one to another > just as we map from one thesaurus to another. The mapping > could then be applied to update > the indexing applications, where the update can be done > automatically, i.e. where it is > simple. (It can't in all cases, as anyone who has done this > kind of work can confirm.) > > One of the reasons I mention this is that trying to carry in > a concept record _all_ the > past history via replacement relationships seems to me to > complicate enormously the > structure as well as leading to semantic difficulties. And a > complicated structure, which > is difficult for people to understand (even if they aren't > often asked to do so) will turn > people off using SKOS (which is meant to be Simple). Whereas > publishing this information > as a mapping (for which there is already a structure defined) > is much simpler for mere > humans to understand. > > Anyway, I hope these few thoughts help. These are difficult > issues to try to tackle in an > online discussion. > > Ron > > ----------------------------------------------- > Ron Davies > Information and documentation systems consultant > Av. Baden-Powell 1 Bte 2, 1200 Brussels, Belgium > Email: ron@rondavies.be > Tel: +32 (0)2 770 33 51 > GSM: +32 (0)484 502 393 > > >
Received on Friday, 5 November 2004 16:09:04 UTC