- From: Ron Davies <ron@rondavies.be>
- Date: Mon, 01 Nov 2004 20:25:47 +0100
- To: public-esw-thes@w3.org
- Message-Id: <6.0.0.22.2.20041008135839.01ca61e0@pop.skynet.be>
Alistair, I had started to prepare a response to your post a few weeks ago, but you had raised indirectly so many issues that I got discouraged with trying to deal with them all. Dagobert's response has given me new courage to try to address some of these issues, >(1) Concept-Oriented Design and Construction I'm not really sure what you mean by concept-oriented design and construction, except that you want applications to use concept identlfiers rather than terms from natural languages to identify a concept. If you mean something more-- perhaps some notion of Platonic concepts that exist independently of language?-- then there are certainly some thorny philosophical issues here. I am much less sanguin than Dagobert is about the ease with which we can separate concepts from language, at least in the "soft" social sciences. If you have look at any multilingual thesaurus, you soon run into situations where the underlying conceptual structures appear to be different in two different languages because the words that they use are not coherent. Is one conceptual structure right and the other wrong? How do we tell? Most often, you have to adopt one or other of the conceptual structures, and then try desperately to make the terms from the other language fit and hope that your poor users are not utterly confused. The thesaurus standards are full of examples of rather unsatisfactory ways you can try to get around this problem. If you don't mean this, but you simply want applications to use concept identifiers, I'm not sure this is a major issue. Whether in a particular system a concept identifier has been entered into an indexing record or an alphabetic string representing a preferred term label really doesn't matter very much. It isn't more "concept-oriented" to rely on an identifier code-- a code is simply an identifier in another (indexing) language. Identifier codes have certainly been used in indexing applications, particularly in environments where synonym rings are required. (I can't remember the name of the system, but at an architectural information centre in Washington twelve or fifteen years ago I saw such a system, developed I think with the participation of the Getty Art History Information Program. All authority control was handled through relational links). Whether to use identifier codes in an indexing/retrieval application or not depends on very practical considerations in the design of the indexing/retrieval system (e.g. what kind of data you want to expose to others, and where further processing of the data is done). For example, you can expose data with a code (an artificial language), and expect the client to do a lookup to substitute a preferred term label in some natural language, or you can provide the preferred term label itself and expect the client to do the translation into other natural languages, or you can expose data with all of the preferred terms in the various natural languages. It's true that using a code makes a few internal changes easier to implement (spelling changes, or swaps between preferred and non-preferred terms) but these are only a small percentage of the changes that take place. And those changes are easy to implement even in a system that uses natural language terms to identify the concept as long as the system supports a global change operation. >(3) Concept-Oriented Maintenance & Management <snip> >This means that, if an authority wishes to >significantly refactor/reorganise/redefine some of its concepts, this is >best done by defining and publishing some new concepts and new concept >identifiers. Again, as Dagobert points out, this is far from simple. The devil here is in the details. What is a significant change? How do we wish to update the indexing data? For example, we might have - swapping a non-preferred term ("data sticks") with a preferred term ("USB keys"). The concept is the same, it is simply the label has altered. In other words, the meaning of the concept hasn't changed. - swapping a spelling or dialectal variant ("labor") with a preferred term ("labour"). Again the meaning hasn't changed. - swapping an abbreviation ("AIDS") with a full form ("Auto-immune deficiency syndrome"). Ditto. The above all seem to be semantically neutral, i.e. the concept hasn't changed. However consider: - adding a scope note ("Use this only for X. For other cases, use the new term Y".) The meaning has changed, but the expression of the concept, i.e. the preferred term, hasn't. - adding a history note ("Used up until 2004. After 2004, use W or Z"). - two concepts are merged into a single concept - a concept is split into two concepts - an non-preferred term is removed - a non-preferred term is removed because it's become its own term, i.e. there really is a new _concept_. "Tropical products" loses the UF "Bananas", because "Bananas" is now a new concept. - adding a new BT or a new NT or a new RT or deleting one or more of these. Which of these represents a new concept? Or do they all do? How are we to update the indexing data, e.g. in the case of a split? >Replacement relationships may be then defined between the old >concepts and the new, which would support perfect interoperability between >systems employing old and new concept sets, and would also support automated >updating of indexing metadata. Traditionally thesauri have usually used versioning to control differences in conceptual structure. In other words, a thesaurus is published in one version, changes are made "offline" and then a new version is produced and published. This has the advantage of organizing work processes, allowing for users to get familiar with a particular conceptual structure, and permitting developers to check each version for conceptual coherence. Links between concepts in one version and another version (the electronic version of the traditional Additions and Changes Lists) could be indicated by mapping from one to another just as we map from one thesaurus to another. The mapping could then be applied to update the indexing applications, where the update can be done automatically, i.e. where it is simple. (It can't in all cases, as anyone who has done this kind of work can confirm.) One of the reasons I mention this is that trying to carry in a concept record _all_ the past history via replacement relationships seems to me to complicate enormously the structure as well as leading to semantic difficulties. And a complicated structure, which is difficult for people to understand (even if they aren't often asked to do so) will turn people off using SKOS (which is meant to be Simple). Whereas publishing this information as a mapping (for which there is already a structure defined) is much simpler for mere humans to understand. Anyway, I hope these few thoughts help. These are difficult issues to try to tackle in an online discussion. Ron >----------------------------------------------- >Ron Davies >Information and documentation systems consultant >Av. Baden-Powell 1 Bte 2, 1200 Brussels, >Belgium Email: ron@rondavies.be >Tel: +32 (0)2 770 33 51 >GSM: +32 (0)484 502 393
Received on Monday, 1 November 2004 19:26:31 UTC