SKOS use cases - why focus on the application?

Hi all,

In recent discussions of how to go about gathering requirements for SKOS, and how to structure SKOS use cases, I
have placed a lot of emphasis on the *application* of controlled vocabularies. In other words, what are the vocabularies
being used for? What is their primary function?

Use cases for SKOS will naturally be concerned with one or more thesauri/classification schemes/taxonomies/[other], and
so the question has been raised, why be concerned about the application of the vocabularies? Why not just consider the
vocabularies themselves?

As I see it, the central goal of the requirements gathering process is to define clear, unambiguous and testable
*criteria* that establish the *sufficiency* of SKOS. I.e. we need to be able to know when SKOS is "good enough" - when
it fulfils its purpose. In project management speak, this is usually called "quality criteria".

The question of the primary application or function of a vocabulary begs a series of further questions. Assuming a 
distributed software architecture, which software components are typically involved in implementing these applications? 
What data do these components require in order to fulfil their function (i.e. work properly)? How do these components 
interact? Which components act as the producers of data, which act as the consumers and which act potentially as both?

My suggestion is that, if we identify one or more *applications* which SKOS should enable, and we know (1) which generic
software components are required to implement those applications and (2) what data the consuming components require to
implement their functionality, we can arrive at some clear, unambiguous and testable quality criteria. I.e. when SKOS is
capable of representing the data required, in a way that satisfies the computational demands of the application, it is
good enough. Or, to look at it from the other direction, when it is *possible* and *practical* to implement the desired
functionality using SKOS for the communication of data, then SKOS is good enough.

My concern is that, without an awareness of the applications SKOS is intended to enable, we will not be able to
establish testable quality criteria. And without that, we have no way of objectively choosing between a number of design
alternatives. Design then becomes very much a question of personal taste, in which case it can be very hard (sometimes
impossible) to arrive at consensus. SKOS also then becomes vulnerable to "feature creep", where features are
continuously added in order to represent all aspects of all vocabularies.

I am also very keen to ensure that SKOS lives up to its name, in particular the "Simple" part. My experience of talking
to many people over the last couple of years has been that the relative simplicity and approachability of SKOS has been
perhaps its main selling point. I would like SKOS to continue to be as simple as possible, and I believe that the way to
achieve this is to know which *applications* are considered most important by the community. SKOS may then be designed
to support those applications, in the simplest possible way.

Another key issue is "interoperability" between different types of vocabulary, especially between thesauri,
classification schemes, taxonomies and subject heading systems. By "interoperability" I mean the current trend towards
software components that can handle more than a single, highly specific, vocabulary type. The reason why software
components might want to handle more than one vocabulary type is because, fundamentally, these vocabulary types are
intended to serve the same basic purpose, which is generally something to do with retrieval and the organisation and
management of information. By focusing on the application of the vocabulary, we may be able to establish what these
different vocabulary types have in common, and how the essential features may be represented within the same framework.

A final reason for focusing on the application is money. The economics of developing and applying controlled
vocabularies is, ultimately, what is driving current trends. We are at a point in time where organisations are beginning
to invest seriously in "Semantic Technologies". As they do, many are discovering that significant initial and ongoing
costs are involved. Also, the learning curve is steep. It is my impression that an awareness of the sizable cost and
intellectual challenge is becoming much more widespread (although many in isolated communities have been well aware of
it for a long time). Potential benefit is demonstrable, but costs must be minimised before solutions based on semantic
technologies become genuinely viable.

It is this drive to enable functionality with demonstrable benefit, at the lowest possible cost, that must underpin the
design of SKOS. By being aware of the application context, we may understand how this can be done, and where trends are
leading. I anticipate that this will become particularly relevant when it comes to a discussion of issues such as
"mappings" between vocabularies, and of managing change within vocabularies (and of managing dependencies between
metadata and changing vocabularies). We must work towards a clear understanding of what is needed to ensure that 
solutions based on SKOS can be economically viable, and practically feasible, propositions - especially for public 
organisations and for SMEs where money is tight.

Cheers,

Alistair.


-- 
Alistair Miles
Research Associate
CCLRC - Rutherford Appleton Laboratory
Building R1 Room 1.60
Fermi Avenue
Chilton
Didcot
Oxfordshire OX11 0QX
United Kingdom
Web: http://purl.org/net/aliman
Email: a.j.miles@rl.ac.uk
Tel: +44 (0)1235 445440

Received on Monday, 13 November 2006 13:20:20 UTC