RE: SKOS & SIMILE, concepts, terms, URIs, mappings

Hi Mark,

This is a response to your mail, for both yourself and a wider audience.

> 
> 
> Hi Alistair 
> 
> Thanks for your explanation, it was very helpful, and allowed me to
> understand why I have been getting some strange results.
> 
> However I've been thinking about this some more, and at the moment I'm
> trying to think through whether it would be useful to be able 
> to assign URIs
> to alternative terms as well as preferred terms. I would be 
> very interested
> to hear your feedback here. I understand you have a 
> distinction between
> concepts and terms, and I've just broken it, but please bear with me. 

This is how I understand this problem (wizened thesaurus gurus please
correct me if I'm off base):

Generally, a thesaurus consists of two sets of terms, the 'preferred' terms
(sometimes called 'descriptors') and the 'non-preferred' terms (sometimes
called 'entry' terms).  Only the preferred terms should be used by
cataloguers in their indexing work - a non-preferred term should never be
used for indexing.  The non-preferred terms are there to help the cataloguer
find their way to the correct preferred term to use (hence the name 'entry'
term).

For example (from GCL) there is the preferred term:

Primary health care
	Use for	
			General Practise (NHS)
			GP services
			Health centres (NHS)
			Maternity services
			NHS Direct

This set of terms constitutes the set of labels for a single concept.  The
intended meaning of this concept should be inferred from the preferred
label, the alternative labels, the neighbouring concepts, and any scope
notes or definition.  

I arrived at this interpretation, and Stella Dextre-Clarke has indicated [1]
(also see follow-up from me [6]) that she shares this interpretation,
moreover it is entirely consistent with the original intention of ISO2788.


So in fact when a cataloguer indexes a document with the GCL term 'Primary
health care' they are indexing the document against a concept whose complete
meaning should be inferred from all the above terms.

Now another thesaurus might have all these terms as preferred terms, in
which case each would be the preferred label for a unique concept with some
finer aspect of meaning (see also discussion on mapping below).

So the SKOS approach is always to consider a preferred term and the
associated set of alternative terms as the set of labels for a single
concept, and that concept is what should be given a URI.  

On a slightly more philosophical note, I think it is absolutely incorrect
and misleading to assign URIs to terms.  In fact there is no point assigning
a URI to a term because a term is just a sequence of characters, and as such
is an identifier for itself.  The useful thing to do is to assign a URI to
some piece of MEANING, and then help other people to infer what you intend
for that piece of meaning by attaching labels, descriptions, definitions,
depictions etc. to it.  In some cases a single label may be sufficient.  In
other cases a long and precise definition may be required.  

This is the only effective way to cope with the reality that a single
sequence of characters can mean different things to different people.     

To compress both the string of characters and the meaning you associate with
it into the same node within a graph is I believe a fundamental error,
although you would be forgiven for doing this because literature coming from
the thesaurus world can be far from clear on this matter.  



> 
> In the Library of Congress Thesaurus of Graphic Materials, 
> there are many
> instances where an alternative term has two or more preferred 
> terms. For
> example in the LOC TGM "cadavers" is an alternative term, and 
> it is linked
> to two preferred terms, "dead bodies" and "dead animals". So 
> I think what is
> happening is the LOC TGM is advocating that cataloguers are 
> better to choose
> either "dead bodies" or "dead animals" rather than use the 
> ambiguous term
> "cadavers". Therefore "cadavers" really represents the union of "dead
> bodies" and "dead animals". However, as SKOS does not allow 
> "cadavers" to
> have a unique URI, it is not possible to reference this term. 
> 
> Other examples of unions in LOC TGM include:
> 
> MT: Abnormalities
> USE: Birth defects
> USE: Human curiosities
> 
> MT: Agony
> USE: Distress
> USE: Pain
> 
> MT: Agreements
> USE: Contracts
> USE: Treaties
> 
> etc
> 

If two concepts in the same thesaurus share some alternative label, it
probably indicates that they share some element of meaning, or are closely
related.  



> Also, in the LOC TGM, there are many cases where a preferred 
> term has many
> alternative terms. Now if we want to map another thesaurus or 
> dataset onto
> LOC TGM, ideally we want to map between identical terms 
> (because our hope is
> are about the same concept) even if they are not preferred. I 
> suspect - with
> obvious caveats that I'm still in the process of 
> understanding thesaurus -
> that alternative and preferred terms do not necessarily refer 
> to the same
> concepts. Rather, they may refer to different but overlapping 
> concepts, and
> one term is preferred because the concept it refers to is 
> "crisper" i.e.
> more well defined and less ambiguous. If alternative labels 
> had URIs, it
> would be possible to represent this. 
> 


The SKOS approach to mapping is explicitly concept oriented.  That is, when
mapping between thesauri, always bear in mind that you are mapping between
the concepts from each thesaurus, and NOT the terms.  I refer you to the
SKOS-Mapping schema [2] and SWAD-E deliverable 8.3 [3].  

Why do this?  Because it is most useful to identify the relationship of
meaning between the entities that are the true indexing units.

I put that sentence on a separate line, because it probably needs some
explanation.  Consider the following example (in N3):

Thesaurus A has a concept ...

conceptA
	a	skos:Concept;
	skos:inScheme		thesaurusA;
	skos:prefLabel	'Primary health care';
	skos:altLabel		'General Practise (NHS)';     
	skos:altLabel		'GP services';     
	skos:altLabel		'Health centres (NHS)';     
	skos:altLabel		'NHS Direct';
    	skos:altLabel		'Maternity services'.			
			

Thesaurus B has a concept ...

conceptB
	a	skos:Concept;
	skos:inScheme		thesaurusB;
	skos:prefLabel	'Maternity services'.

Now although there is a label shared between these two concepts, it is
obvious that concept A is broader in meaning than concept B.  So although a
common label suggests that some mapping can be defined, the exact nature of
that mapping cannot be defined without considering the complete intended
meaning of each concept.

In this case, the appropriate mapping would be ...

conceptA
	skos-map:narrowMapping	conceptB.

conceptB	
	skos-map:broadMapping		conceptA.

[3] has further examples.

Now that we have this mapping, we could substitute concept A for concept B
in a query, and know that we will get a result set that is broader in scope
than the original intension of the query.  I.e. this type of mapping is a
basis for managing the specificity and completeness of result sets under
query substitution/translation.



> If URIs were assigned to both preferred and alternative 
> terms, this would
> allow them to use rdfs:label as opposed to skos:prefLabel and 
> skos:altLabel,
> and I think using rdfs:label whenever possible is very useful 
> as it makes
> life much easier for browsers. 
> 
> An additional problem here is it seems natural to use rdf:type to
> distinguish between preferred and alternative terms. However a term is
> preferred or alternative only in the scope of a particular 
> thesaurus. If we
> use rdf:type, then when we use owl:sameAs to map terms in 
> different thesauri
> or a thesauri and a dataset (this is the approach I'm using 
> at the moment)
> then preferred status may migrate in undesirable ways e.g. if 
> we map term A
> in thesaurus B to term C in thesaurus D, where term A is 
> preferred and term
> C is alternative, then suddenly term C will become preferred 
> in D which is
> not our intention. 

I do not recommend the use of owl:sameAs to express a mapping between
concepts from different thesauri.  The reason for this is that it blurs the
boundary between the two thesauri.  Where you wish to maintain the integrity
(boundary) of each scheme, use skos-map:exactMapping.  

In the alternative use case where you want to link two thesauri to create a
larger thesaurus, using owl:sameAs IS recommended, along with any of the
semantic relation properties from SKOS-Core.

In general, to express a relationship of meaning between two concepts within
the same thesaurus, use any of the sub-properties of skos:semanticRelation
(from SKOS-Core schema [4][5]).  To express a relationship of meaning
between two concepts from different thesauri, use any of the sub-properties
of skos-map:semanticMapping (from SKOS-Mapping schema [2][3]).


> One possible solution here would be to 
> have properties
> such as skos:preferredTermIn and skos:alternativeTermIn that 
> point back to
> the thesauri where the term is preferred or alternative?
> 
> What do you think? 

These suggested properties imply a term-oriented approach to modelling
thesauri in RDF.  I hope I have been able to make the beginnings of a case
here for why I believe a concept-oriented approach to modelling thesauri
promises to be far more fruitful.

I'm going to leave it there because this is possibly the longest email I've
ever written.

Yours,

Alistair.


[1] http://lists.w3.org/Archives/Public/public-esw-thes/2004Mar/0057.html
[2] http://www.w3c.rl.ac.uk/2003/11/21-skos-mapping
[3] http://www.w3c.rl.ac.uk/SWAD/deliverables/8.4.html
[4] http://www.w3.org/2004/02/skos/core
[5] http://www.w3.org/2001/sw/Europe/reports/thes/1.0/guide/
[6] http://lists.w3.org/Archives/Public/public-esw-thes/2004Mar/0060.html


> 
> -----Original Message-----
> From: Miles, AJ (Alistair) [mailto:A.J.Miles@rl.ac.uk] 
> Sent: 19 April 2004 19:21
> To: 'Butler, Mark'
> Subject: RE: SKOS & SIMILE
> 
> 
> Hi Mark,
> 
> Re USE relationships, SKOS treats this in a different way.  
> The set of terms
> that constitute a preferred term and the synonyms 
> (non-preferred terms) is
> modelled as the set of possible labels for a single concept.  
> 
> So for example (from UK GCL):
> ---
> Animal rights and welfare
> 	UF	Animal welfare	
> 	UF	Welfare (animals)
> 	
> Animal Welfare
> 	USE	Animal rights and welfare
> 
> Welfare (animals)
> 	USE	Animal rights and welfare
> ---
> 
> ... gets mapped into the following SKOS construct:
> 
> <skos:Concept>
> 	<skos:prefLabel>Animal rights and welfare</skos:prefLabel>
> 	<skos:altLabel>Animal welfare</skos:altLabel>
> 	<skos:altLabel>Welfare (animals)</skos:altLabel> </skos:Concept>
> 
> [Here the concept is a blank node to illustrate the 
> principal, but should
> probably be given an explicit URI.]
> 
> The node representing the concept then becomes the indexing 
> unit, and not
> any of the labels.
> 
> Hope that helps,
> 
> Alistair.
> 
> ---
> Alistair Miles
> Research Associate
> CCLRC - Rutherford Appleton Laboratory
> Building R1 Room 1.60
> Fermi Avenue
> Chilton
> Didcot
> Oxfordshire OX11 0QX
> United Kingdom
> Email:        a.j.miles@rl.ac.uk
> Tel: +44 (0)1235 445440
> 
> 
> 

Received on Wednesday, 21 April 2004 06:54:58 UTC