Re: Global concept identification and reference

In message <6.1.2.0.2.20041110153823.01ad4c28@pop.skynet.be>, Ron Davies 
<ron@rondavies.be> writes
>
>Organizations adopting a widely-used thesaurus, like the OECD
>Macrothesarus, nearly always make some changes to it so that it meets
>local needs. They have bought the thesaurus or obtained it legally, and as
>long as the changes are not massive, copyright has never to my
>knowledge  been an issue in this regard: they are not re-publishing the
>thesaurus, just using it in their application. They state publicly that they
>use the original thesaurus (say the Macrothesaurus), though they will
>likely provide a human-readable note to a user saying what has been
>changed for the local application.
>
>If now, however, in a semantic web environment, they need to expose their
>concepts with an identifier, what identifier do they use for the new or
>modified concepts they have introduced, and for the old ones that they
>have taken over?
>
>a)  Should they use an identifier that identifies the original thesaurus as
>the source for _all_ concepts, even though strictly speaking this isn't true,
>and an application relying on the fact that they do may be in for some
>surprises?

No.

>b) Do they create a new identifier for their "version" of the thesaurus and
>use this even for concepts which are the same in the local version as they
>are in the standard version?

No.

>c) Do they use an identifier to the original thesaurus for the terms that
>have not changed and use an identifier for their local version for the terms
>that have been modified or added? If so, how does an application discover
>what the local modifications are?

Yes.

(All my own, non-expert, views.)

What Ron's question shows up is the need to be able to reference 
concepts from a related resource.  The logically clean way for the 
thesaurus user to deal with the situation outlined (as you say, it is a 
common real-world requirement) is surely to assert that the user takes 
intellectual responsibility for their own extensions to the thesaurus, 
but not for those in the core thesaurus.

This means that you need a mechanism for saying, at the level of 
individual concepts, that "my concept X is a 'narrow term' of OECD 
concept Y".  Is that any harder than saying "OECD concept X is a 'narrow 
term' of OECD concept Y"?  I don't see why it should be.

Doing things this way would give important practical benefits.  It means 
that updates to the base thesaurus can be installed with minimum fuss, 
since the only links from the local extensions which will need any 
attention are those to concepts which (for whatever reason) have been 
removed from the base thesaurus.  Otherwise, the fact that these 
extension concepts link to base concepts using invariant PSI-type 
identifiers means that they are immune to internal changes within the 
base thesaurus.  To take a programming analogy, you can think of the set 
of PSIs exported by an ontology as an interface.

Taking the question of how an application discovers what the local 
modifications are, you possibly need a mechanism to indicate that the 
whole local thesaurus consists of OECD plus local extensions.  Embrace 
and extend, if you will.  Or then again, maybe this isn't necessary. 
Maybe the individual relationships linking extension concepts into the 
base thesaurus, plus the ability to address either from your data, are 
sufficient.

Richard Light

-- 
Richard Light
SGML/XML and Museum Information Consultancy
richard@light.demon.co.uk

Received on Thursday, 11 November 2004 14:48:38 UTC