- From: Miles, AJ (Alistair) <A.J.Miles@rl.ac.uk>
- Date: Wed, 21 Apr 2004 11:53:55 +0100
- To: "'Butler, Mark'" <mark-h.butler@hp.com>
- Cc: "'www-rdf-dspace@w3.org'" <www-rdf-dspace@w3.org>, "'public-esw-thes@w3.org'" <public-esw-thes@w3.org>
Hi Mark, This is a response to your mail, for both yourself and a wider audience. > > > Hi Alistair > > Thanks for your explanation, it was very helpful, and allowed me to > understand why I have been getting some strange results. > > However I've been thinking about this some more, and at the moment I'm > trying to think through whether it would be useful to be able > to assign URIs > to alternative terms as well as preferred terms. I would be > very interested > to hear your feedback here. I understand you have a > distinction between > concepts and terms, and I've just broken it, but please bear with me. This is how I understand this problem (wizened thesaurus gurus please correct me if I'm off base): Generally, a thesaurus consists of two sets of terms, the 'preferred' terms (sometimes called 'descriptors') and the 'non-preferred' terms (sometimes called 'entry' terms). Only the preferred terms should be used by cataloguers in their indexing work - a non-preferred term should never be used for indexing. The non-preferred terms are there to help the cataloguer find their way to the correct preferred term to use (hence the name 'entry' term). For example (from GCL) there is the preferred term: Primary health care Use for General Practise (NHS) GP services Health centres (NHS) Maternity services NHS Direct This set of terms constitutes the set of labels for a single concept. The intended meaning of this concept should be inferred from the preferred label, the alternative labels, the neighbouring concepts, and any scope notes or definition. I arrived at this interpretation, and Stella Dextre-Clarke has indicated [1] (also see follow-up from me [6]) that she shares this interpretation, moreover it is entirely consistent with the original intention of ISO2788. So in fact when a cataloguer indexes a document with the GCL term 'Primary health care' they are indexing the document against a concept whose complete meaning should be inferred from all the above terms. Now another thesaurus might have all these terms as preferred terms, in which case each would be the preferred label for a unique concept with some finer aspect of meaning (see also discussion on mapping below). So the SKOS approach is always to consider a preferred term and the associated set of alternative terms as the set of labels for a single concept, and that concept is what should be given a URI. On a slightly more philosophical note, I think it is absolutely incorrect and misleading to assign URIs to terms. In fact there is no point assigning a URI to a term because a term is just a sequence of characters, and as such is an identifier for itself. The useful thing to do is to assign a URI to some piece of MEANING, and then help other people to infer what you intend for that piece of meaning by attaching labels, descriptions, definitions, depictions etc. to it. In some cases a single label may be sufficient. In other cases a long and precise definition may be required. This is the only effective way to cope with the reality that a single sequence of characters can mean different things to different people. To compress both the string of characters and the meaning you associate with it into the same node within a graph is I believe a fundamental error, although you would be forgiven for doing this because literature coming from the thesaurus world can be far from clear on this matter. > > In the Library of Congress Thesaurus of Graphic Materials, > there are many > instances where an alternative term has two or more preferred > terms. For > example in the LOC TGM "cadavers" is an alternative term, and > it is linked > to two preferred terms, "dead bodies" and "dead animals". So > I think what is > happening is the LOC TGM is advocating that cataloguers are > better to choose > either "dead bodies" or "dead animals" rather than use the > ambiguous term > "cadavers". Therefore "cadavers" really represents the union of "dead > bodies" and "dead animals". However, as SKOS does not allow > "cadavers" to > have a unique URI, it is not possible to reference this term. > > Other examples of unions in LOC TGM include: > > MT: Abnormalities > USE: Birth defects > USE: Human curiosities > > MT: Agony > USE: Distress > USE: Pain > > MT: Agreements > USE: Contracts > USE: Treaties > > etc > If two concepts in the same thesaurus share some alternative label, it probably indicates that they share some element of meaning, or are closely related. > Also, in the LOC TGM, there are many cases where a preferred > term has many > alternative terms. Now if we want to map another thesaurus or > dataset onto > LOC TGM, ideally we want to map between identical terms > (because our hope is > are about the same concept) even if they are not preferred. I > suspect - with > obvious caveats that I'm still in the process of > understanding thesaurus - > that alternative and preferred terms do not necessarily refer > to the same > concepts. Rather, they may refer to different but overlapping > concepts, and > one term is preferred because the concept it refers to is > "crisper" i.e. > more well defined and less ambiguous. If alternative labels > had URIs, it > would be possible to represent this. > The SKOS approach to mapping is explicitly concept oriented. That is, when mapping between thesauri, always bear in mind that you are mapping between the concepts from each thesaurus, and NOT the terms. I refer you to the SKOS-Mapping schema [2] and SWAD-E deliverable 8.3 [3]. Why do this? Because it is most useful to identify the relationship of meaning between the entities that are the true indexing units. I put that sentence on a separate line, because it probably needs some explanation. Consider the following example (in N3): Thesaurus A has a concept ... conceptA a skos:Concept; skos:inScheme thesaurusA; skos:prefLabel 'Primary health care'; skos:altLabel 'General Practise (NHS)'; skos:altLabel 'GP services'; skos:altLabel 'Health centres (NHS)'; skos:altLabel 'NHS Direct'; skos:altLabel 'Maternity services'. Thesaurus B has a concept ... conceptB a skos:Concept; skos:inScheme thesaurusB; skos:prefLabel 'Maternity services'. Now although there is a label shared between these two concepts, it is obvious that concept A is broader in meaning than concept B. So although a common label suggests that some mapping can be defined, the exact nature of that mapping cannot be defined without considering the complete intended meaning of each concept. In this case, the appropriate mapping would be ... conceptA skos-map:narrowMapping conceptB. conceptB skos-map:broadMapping conceptA. [3] has further examples. Now that we have this mapping, we could substitute concept A for concept B in a query, and know that we will get a result set that is broader in scope than the original intension of the query. I.e. this type of mapping is a basis for managing the specificity and completeness of result sets under query substitution/translation. > If URIs were assigned to both preferred and alternative > terms, this would > allow them to use rdfs:label as opposed to skos:prefLabel and > skos:altLabel, > and I think using rdfs:label whenever possible is very useful > as it makes > life much easier for browsers. > > An additional problem here is it seems natural to use rdf:type to > distinguish between preferred and alternative terms. However a term is > preferred or alternative only in the scope of a particular > thesaurus. If we > use rdf:type, then when we use owl:sameAs to map terms in > different thesauri > or a thesauri and a dataset (this is the approach I'm using > at the moment) > then preferred status may migrate in undesirable ways e.g. if > we map term A > in thesaurus B to term C in thesaurus D, where term A is > preferred and term > C is alternative, then suddenly term C will become preferred > in D which is > not our intention. I do not recommend the use of owl:sameAs to express a mapping between concepts from different thesauri. The reason for this is that it blurs the boundary between the two thesauri. Where you wish to maintain the integrity (boundary) of each scheme, use skos-map:exactMapping. In the alternative use case where you want to link two thesauri to create a larger thesaurus, using owl:sameAs IS recommended, along with any of the semantic relation properties from SKOS-Core. In general, to express a relationship of meaning between two concepts within the same thesaurus, use any of the sub-properties of skos:semanticRelation (from SKOS-Core schema [4][5]). To express a relationship of meaning between two concepts from different thesauri, use any of the sub-properties of skos-map:semanticMapping (from SKOS-Mapping schema [2][3]). > One possible solution here would be to > have properties > such as skos:preferredTermIn and skos:alternativeTermIn that > point back to > the thesauri where the term is preferred or alternative? > > What do you think? These suggested properties imply a term-oriented approach to modelling thesauri in RDF. I hope I have been able to make the beginnings of a case here for why I believe a concept-oriented approach to modelling thesauri promises to be far more fruitful. I'm going to leave it there because this is possibly the longest email I've ever written. Yours, Alistair. [1] http://lists.w3.org/Archives/Public/public-esw-thes/2004Mar/0057.html [2] http://www.w3c.rl.ac.uk/2003/11/21-skos-mapping [3] http://www.w3c.rl.ac.uk/SWAD/deliverables/8.4.html [4] http://www.w3.org/2004/02/skos/core [5] http://www.w3.org/2001/sw/Europe/reports/thes/1.0/guide/ [6] http://lists.w3.org/Archives/Public/public-esw-thes/2004Mar/0060.html > > -----Original Message----- > From: Miles, AJ (Alistair) [mailto:A.J.Miles@rl.ac.uk] > Sent: 19 April 2004 19:21 > To: 'Butler, Mark' > Subject: RE: SKOS & SIMILE > > > Hi Mark, > > Re USE relationships, SKOS treats this in a different way. > The set of terms > that constitute a preferred term and the synonyms > (non-preferred terms) is > modelled as the set of possible labels for a single concept. > > So for example (from UK GCL): > --- > Animal rights and welfare > UF Animal welfare > UF Welfare (animals) > > Animal Welfare > USE Animal rights and welfare > > Welfare (animals) > USE Animal rights and welfare > --- > > ... gets mapped into the following SKOS construct: > > <skos:Concept> > <skos:prefLabel>Animal rights and welfare</skos:prefLabel> > <skos:altLabel>Animal welfare</skos:altLabel> > <skos:altLabel>Welfare (animals)</skos:altLabel> </skos:Concept> > > [Here the concept is a blank node to illustrate the > principal, but should > probably be given an explicit URI.] > > The node representing the concept then becomes the indexing > unit, and not > any of the labels. > > Hope that helps, > > Alistair. > > --- > Alistair Miles > Research Associate > CCLRC - Rutherford Appleton Laboratory > Building R1 Room 1.60 > Fermi Avenue > Chilton > Didcot > Oxfordshire OX11 0QX > United Kingdom > Email: a.j.miles@rl.ac.uk > Tel: +44 (0)1235 445440 > > >
Received on Wednesday, 21 April 2004 06:54:58 UTC