[PORT] Concept identification and reference from Miles, AJ (Alistair) on 2004-11-04 (public-swbp-wg@w3.org from November 2004)

From: Miles, AJ (Alistair) <A.J.Miles@rl.ac.uk>
Date: Thu, 4 Nov 2004 13:06:49 -0000
To: "'public-esw-thes@w3.org'" <public-esw-thes@w3.org>, "'public-swbp-wg@w3.org'" <public-swbp-wg@w3.org>
Message-ID: <350DC7048372D31197F200902773DF4C05E50CE4@exchange11.rl.ac.uk>

Hi all,

I have a key issue to resolve ...

Using thesauri as part of the semantic web depends on being able to uniquely
reference a thesaurus concept within a global information space.

The simplest way to uniquely reference a thesaurus concept is via a URI.
However, very few (if any) thesauri have URIs assigned to their concepts.  

It is obviously a point of good practise to encourage thesaurus developers
to assign and publish URIs for the concepts in the thesauri they are
developing.  These concepts will then have 'official' URIs.  However, such a
practise will take time to be implemented.  

In the mean time, we would like to be able to publish RDF descriptions of
existing thesauri, for which there are no 'official' concept URIs.  

One practise has been, in this case, to make up unofficial URIs.  However,
this practise can obviously lead to the proliferation of multiple URIs for
the same concept.  Although the mechanisms obviously exist to cope with
this, from a pragmatic point of view it might make sense to discourage this
practise, unless absolutely necessary, where alternatives exist and it can
be avoided.  

So what alternatives are there to making up unofficial URIs for concepts? 

One option is to encourage RDF descriptions of current thesauri where all
concept nodes are blank nodes.  This can be facilitated within an RDF/XML
description of a thesaurus, for example, by the use of the rdf:nodeID
attribute.  

An RDF description of a thesaurus with all concept nodes as blank nodes at
least means that a machine readable description of the thesaurus exists, and
can be imported between applications.  And so a partial goal is satisfied
...

However, it does not solve the problem of how a person might, for example,
refer to one of these concepts as part of the RDF description of a web
document.

In this case, there is a possibility to use 'reference by description'.  The
mechanism for unique identification of concepts within a print environment
is traditionally via the preferred term (or 'descriptor') for that concept,
which is a unique term within a thesaurus.  The combination of the preferred
term for a concept, and a URI identifying the thesaurus, therefore provides
a globally unique description of a concept.

The problem here is that, whereas reference by description for people in
FOAF can be satisfied by a single property (e.g. foaf:mbox), for which the
inverse-functional property machinery in OWL provides an implementation,
reference by description for concepts as described above depends on at least
two properties (e.g. combination of skos:prefLabel and skos:inScheme), for
which implementations would depend on the expression of identity rules.

So the choice I see boils down to:

When describing best practise for creating RDF descriptions of thesauri
without official URIs, do we ...

 (a) attempt to remain neutral about whether people make up unofficial URIs,
and rely on the owl:sameAs machinery to cope with multiple published URIs
for the same concept, or ...
 (b) actively encourage the publication of these thesauri with concept nodes
as blank nodes, and additionally publish guidelines on how reference by
description may be used to refer to such concepts from other RDF
descriptions (which may depend on rules technology without any current
standard implementations).

What do you think ???

Al. ~:)


---
Alistair Miles
Research Associate
CCLRC - Rutherford Appleton Laboratory
Building R1 Room 1.60
Fermi Avenue
Chilton
Didcot
Oxfordshire OX11 0QX
United Kingdom
Email:        a.j.miles@rl.ac.uk
Tel: +44 (0)1235 445440

Received on Thursday, 4 November 2004 13:07:33 UTC