- From: Miles, AJ (Alistair) <A.J.Miles@rl.ac.uk>
- Date: Mon, 1 Nov 2004 15:05:26 -0000
- To: public-esw-thes@w3.org
Fwded from Dagobert ... -----Original Message----- From: Dagobert Soergel [mailto:dsoergel@umd.edu] Sent: 23 October 2004 04:01 To: Miles, AJ (Alistair) Subject: Re: FW: vision for controlled vocabulary use and management (1) I agree with you entirely. Information retrieval (or resource discovery) is to a large extent about enabling the user to improve his or her understanding of subjects, topics, issues, contexts that are a defined in terms of concepts, it is to a large extent to help users create meaning. Consequently, thesauri and other knowledge organization systems (KOS) need to be based on concepts and furthermore present these concepts in a meaningful arrangement. The semantic Web is by its very definition about concepts, it needs well-structured, well-defined concept systems called ontologies. Ideally, concept-based KOS will serve both for human information processing and for machine information processing. This thought is, of course, not new; what is significant that it seems adapted and acted on more broadly. On the other hand, human thought and language are very complex, and a pure concept-based approach may not capture all the nuances of language. Therefore, the model outlined in http://jodi.ecs.soton.ac.uk/Articles/v04/i04/Soergel/ is primarily concept-based and links terms to concepts but also allows for treating terms as entities in their own right that can enter into relationships with other terms; these relationships complement the concept-concept and concept-term relationships. Finally, at the level of concept expression for human readers and programs for language processing we have strings, a term being expressed by one or more strings. Since concepts and language are inextricably linked, we need a structure of sufficient complexity. (2) Much search is based on free-text searching (or on automatically detectable image features in image retrieval; in that environment, limited concept-based search can be accomplished through query expansion supported by a KOS. In high-payoff situations, sophisticated indexing pays of. Such indexing, whether by people, computer programs (yes, computer programs can do concept-oriented indexing, particularly when supported by a rich KOS as a knowledge base), or human-computer collaboration, should always be concept-based. There are many advantages to record concepts by concept identifiers, preferable URIs, in the indexing record: The terms displayed to the user can be adapted to the user's language and to current usage. When the user clicks on the surface term concept, the system can use the underlying URI to quickly link to the specific record in a KOS where more information on the concept can be found. But again, users also need to be supported in understanding natural language terms, particularly if they read in a foreign language, and a good system for the retrieval and reading of documents would have a function that accesses multiple KOS on the Web to find such explanations. (3) is desirable in principle but a bit more problematic. In a system where concepts are defined axiomatically (as in a system that supports automatic reasoning) a change in definition certainly creates a new concept that should get a new URI. The new concept should be properly related to the old concept, and the old concept should be marked as no longer used in the system at hand (which would not stop some other system from using it). But consider he case of concepts defined for the purpose of gathering and reporting statistics, such as the definition of a branch of industry. or the Consumer Price Index or what constitutes poverty. A minor change in wording may be introduced to clarify, not change, the meaning. Do we have a new concept? Or the definition could be changed more or less drastically, definitely creating a new concept. The old concept would still be needed to understand historical statistics. This clearly requires fairly elaborate version control. So, in sum, a concept-oriented approach will go a long way to improve information retrieval and processing by people and machines. Dagobert At 10/8/2004 11:14 AM, you wrote: >Hi Dagobert, > >Would you be able to comment on this, for the public-esw-thes@w3.org mailing >list? ... > > >-----Original Message----- >From: public-esw-thes-request@w3.org >[mailto:public-esw-thes-request@w3.org]On Behalf Of Miles, AJ (Alistair) > >Sent: 08 October 2004 12:46 >To: 'public-esw-thes@w3.org' >Subject: vision for controlled vocabulary use and management > > > >Hi all, > >I thought I'd try to put down in words where I have assumed controlled >vocabulary development and use is (or ought to be :) going. Because SKOS >Core is forward looking, and has been designed with a lot of future-proofing >in mind, I wanted to check that some simple elements of this vision make >sense to everyone else. > >(1) Concept-Oriented Design and Construction > >The design and construction of controlled vocabularies, thesauri etc. will >become *concept-oriented*. This means that concepts are identified >explicitly, and the meaning of a concept is understood to be taken from the >combination of its labels, notes etc. > >(2) Concept-Oriented Indexing > >The use of controlled vocabularies, thesauri etc. for (subject-based) >indexing will become concept oriented. This means that the index values in >record metadata will be *concept-identifiers* and not terms. In turn, >indexing applications will hide these identifiers from the indexer ... so an >indexer will interact with a set of concepts via their labels and notes, as >part of selecting the appropriate concept, and the insertion of >concept-identifiers into metadata is performed by the application. > >(3) Concept-Oriented Maintenance & Management > >Once a concept identifier has been published, it is in the interest of the >publishing authority to avoid altering the meaning of the concept associated >with that identifier. If the meaning is significantly altered over time, >the identifier will be applied inconsistently in indexing metadata, and its >utility will be reduced. This means that, if an authority wishes to >significantly refactor/reorganise/redefine some of its concepts, this is >best done by defining and publishing some new concepts and new concept >identifiers. Replacement relationships may be then defined between the old >concepts and the new, which would support perfect interoperability between >systems employing old and new concept sets, and would also support automated >updating of indexing metadata. > >These are the kinds of assumption I've been working on ... so if these look >wrong to you, please tell me. Also, Stella's last email [1] highlighted the >difference between this vision and current paradigm and practise within the >thesaurus user community, wrt change management. Ideally, I would like SKOS >Core to be where the thesaurus user community will arrive in perhaps a >couple of years, so that it fits the requirements. However, this may be at >least in part unrealistic. Bridging the space between the current users of >controlled vocabularies and the framework of the semantic web is what I see >as the central goal of the SKOS work ... which may require a little more >meeting in the middle ... ? > >Anyway, food for thought. > >Al. > >[1] http://lists.w3.org/Archives/Public/public-esw-thes/2004Oct/0048.html > >--- >Alistair Miles >Research Associate >CCLRC - Rutherford Appleton Laboratory >Building R1 Room 1.60 >Fermi Avenue >Chilton >Didcot >Oxfordshire OX11 0QX >United Kingdom >Email: a.j.miles@rl.ac.uk >Tel: +44 (0)1235 445440 Dagobert Soergel College of Information Studies University of Maryland 4105 Hornbake Library College Park, MD 20742-4345 Office: 301-405-2037 Home: 703-823-2840 Mobile: 703-585-2840 OFax: 301-314-9145 HFax: 703-823-6427 dsoergel@umd.edu www.dsoergel.com
Received on Monday, 1 November 2004 15:06:00 UTC