SKOS use cases - why focus on the application?

Hi all,

In recent discussions of how to go about gathering requirements for 
SKOS, and how to structure SKOS use cases, I have placed a lot of 
emphasis on the *application* of controlled vocabularies. In other 
words, what are the vocabularies being used for? What is their primary 
function?

Use cases for SKOS will naturally be concerned with one or more 
thesauri/classification schemes/taxonomies/[other], and so the 
question has been raised, why be concerned about the application of 
the vocabularies? Why not just consider the vocabularies themselves?

As I see it, the central goal of the requirements gathering process is 
to define clear, unambiguous and testable *criteria* that establish 
the *sufficiency* of SKOS. I.e. we need to be able to know when SKOS 
is "good enough" - when it fulfils its purpose. In project management 
speak, this is usually called "quality criteria".

The question of the primary application or function of a vocabulary 
begs a series of further questions. Assuming a distributed software 
architecture, which software components are typically involved in 
implementing these applications? What data do these components require 
in order to fulfil their function (i.e. work properly)? How do these 
components interact? Which components act as the producers of data, 
which act as the consumers and which act potentially as both?

My suggestion is that, if we identify one or more *applications* which 
SKOS should enable, and we know (1) which generic software components 
are required to implement those applications and (2) what data the 
consuming components require to implement their functionality, we can 
arrive at some clear, unambiguous and testable quality criteria. I.e. 
when SKOS is capable of representing the data required, in a way that 
satisfies the computational demands of the application, it is good 
enough. Or, to look at it from the other direction, when it is 
*possible* and *practical* to implement the desired functionality 
using SKOS for the communication of data, then SKOS is good enough.

My concern is that, without an awareness of the applications SKOS is 
intended to enable, we will not be able to establish testable quality 
criteria. And without that, we have no way of objectively choosing 
between a number of design alternatives. Design then becomes very much 
a question of personal taste, in which case it can be very hard 
(sometimes impossible) to arrive at consensus. SKOS also then becomes 
vulnerable to "feature creep", where features are continuously added 
in order to represent all aspects of all vocabularies.

I am also very keen to ensure that SKOS lives up to its name, in 
particular the "Simple" part. My experience of talking to many people 
over the last couple of years has been that the relative simplicity 
and approachability of SKOS has been perhaps its main selling point. I 
would like SKOS to continue to be as simple as possible, and I believe 
that the way to achieve this is to know which *applications* are 
considered most important by the community. SKOS may then be designed 
to support those applications, in the simplest possible way.

Another key issue is "interoperability" between different types of 
vocabulary, especially between thesauri, classification schemes, 
taxonomies and subject heading systems. By "interoperability" I mean 
the current trend towards software components that can handle more 
than a single, highly specific, vocabulary type. The reason why 
software components might want to handle more than one vocabulary type 
is because, fundamentally, these vocabulary types are intended to 
serve the same basic purpose, which is generally something to do with 
retrieval and the organisation and management of information. By 
focusing on the application of the vocabulary, we may be able to 
establish what these different vocabulary types have in common, and 
how the essential features may be represented within the same framework.

A final reason for focusing on the application is money. The economics 
of developing and applying controlled vocabularies is, ultimately, 
what is driving current trends. We are at a point in time where 
organisations are beginning to invest seriously in "Semantic 
Technologies". As they do, many are discovering that significant 
initial and ongoing costs are involved. Also, the learning curve is 
steep. It is my impression that an awareness of the sizable cost and 
intellectual challenge is becoming much more widespread (although many 
in isolated communities have been well aware of it for a long time). 
Potential benefit is demonstrable, but costs must be minimised before 
solutions based on semantic technologies become genuinely viable.

It is this drive to enable functionality with demonstrable benefit, at 
the lowest possible cost, that must underpin the design of SKOS. By 
being aware of the application context, we may understand how this can 
be done, and where trends are leading. I anticipate that this will 
become particularly relevant when it comes to a discussion of issues 
such as "mappings" between vocabularies, and of managing change within 
vocabularies (and of managing dependencies between metadata and 
changing vocabularies). We must work towards a clear understanding of 
what is needed to ensure that solutions based on SKOS can be 
economically viable, and practically feasible, propositions - 
especially for public organisations and for SMEs where money is tight.

Cheers,

Alistair.

-- 
Alistair Miles
Research Associate
CCLRC - Rutherford Appleton Laboratory
Building R1 Room 1.60
Fermi Avenue
Chilton
Didcot
Oxfordshire OX11 0QX
United Kingdom
Web: http://purl.org/net/aliman
Email: a.j.miles@rl.ac.uk
Tel: +44 (0)1235 445440

Received on Monday, 13 November 2006 13:37:39 UTC