Re: ISSUE-160: Allowing collections in semantic relationships

On Thu, 4 Dec 2008 at 11:56:00, Alistair Miles 
<alistair.miles@zoo.ox.ac.uk> wrote
>Take the AAT sample data, for example.
>
>You quite nicely describe the metamodel underlying the AAT data, where 
>the data is structured as Records of one of four types (Concept, Facet, 
>GuideTerm, HierarchyName), and where parent/child relationships can 
>exist between Records of any type.
>
>You are still left with an open choice about how to define a 
>transformation which will map this metamodel onto the SKOS data model.
>
>For example, your transformation could be as follows: for each AAT 
>Record, generate an instance of skos:Concept, regardless of the type of 
>the Record; for each parent/child relationship between AAT Records, 
>generate a triple X skos:broader Y.
>
>Alternatively, your transformation could be as follows: for each AAT 
>Record of type Concept, generate an instance of skos:Concept. For each 
>AAT Record of type Concept, walk up the parent/child relationships 
>until you find another AAT Record of type Concept, and generate a 
>triple X skos:broader Y.
>
>These are not complete descriptions of each transformation, but I hope 
>they illustrate the point that the AAT metamodel *does not constrain 
>you* with respect to how you represent the same data as SKOS. Just 
>because there is a "parent/child" relationship between "records" in the 
>AAT data, doesn't mean you must generate a triple X skos:broader Y in 
>the SKOS representation.

I agree with Alistair's second option for the transformation required. 
Some software or communication formats may, for simplicity of 
manipulation and display, imply broader/narrower relationships between 
elements which according to the standards for thesaurus construction 
should not have such a relationship. So long as the nature of the 
elements can be distinguished by some other means, a more accurate 
interpretation should be used when importing the thesaurus into software 
which supports it, such as SKOS.

AAT is a particularly tricky example, because they use the expression 
"Guide terms" to include both "node labels specifying a characteristic 
of division" and "labels for concepts which should not be used for 
indexing", but which nevertheless occupy a valid place in the BT/NT 
hierarchy. There is also a risk of confusion between "facets" (groups of 
concepts of the same inherent category) and "facet labels" (a.k.a. "node 
labels containing the name of a facet") which specify what facet 
subsequent concepts belong to, in a classified display.

As far as I can see, the SKOS format does not properly represent the 
structure in the BS DD8723-5 draft standard, as "collections" in SKOS do 
not directly correspond to "arrays" in the BS model.

It may be of interest to SKOS people to know that we have continued to 
develop the UML model while working on the ISO version of the standard, 
ISO 25964; although based on the BS model this has some additional 
features and changes. I attach a copy of the model incorporating the 
latest thinking of the ISO working party, and it would be good if any 
SKOS development could use this rather than the BS draft. Some of the 
labels have been changed to make it easier to transform into OWL - 
thanks to Bernard Vatant for this.

The revised model contains a new element - the "ConceptGroup", which we 
have explained as follows:

"Many thesauri group concepts using a classification structure which 
exists in parallel to the hierarchies of thesaurus concepts based on 
BT/NT relationships. Groups created by the classification are often 
based on disciplines, subject areas or areas of business activity. They 
are sometimes called "subject categories", "themes", "domains", "groups" 
or "microthesauri". The model provides for all of these by providing the 
classes ConceptGroup and ConceptGroupLabel and the specific type may be 
indicated by the attribute conceptGroupType. There is not, in general, a 
BT/NT relationship between a ConceptGroup and the concepts which it 
contains.

Concepts may be gathered into ConceptGroups from many different facets 
or hierarchies of the thesaurus, and the notation used for the 
classification into groups may be quite distinct from any notation that 
may be used for the concepts themselves. Groups may have subgroups, 
being nested to any level. Each group should be given one verbal label 
per language."

[I'm sorry that due to irritating ISO restrictions I cannot make the 
full draft available at this stage, so I hope I can get away with the 
above quotation from the notes on the model. I'll explain any other 
points that are not clear, if asked.]

This provision for a loose grouping of concepts relevant to a subject 
area in fact seems closer to SKOS "collections" than the more strictly 
defined "arrays", which are groups of sibling concepts. We would really 
like SKOS to provide for this distinction.

"Concepts which should not be used for indexing" can be indicated by 
giving them an appropriate custom attribute or note, such as "Use a more 
specific concept if possible" (I prefer this less restrictive note, as 
there are cases where such a term can be useful, especially when 
searching for it and all its narrower terms).

Leonard Will

-- 
Willpower Information       (Partners: Dr Leonard D Will, Sheena E Will)
Information Management Consultants              Tel: +44 (0)20 8372 0092
27 Calshot Way, Enfield, Middlesex EN2 7BQ, UK. Fax: +44 (0)870 051 7276
L.Will@Willpowerinfo.co.uk               Sheena.Will@Willpowerinfo.co.uk
---------------- <URL:http://www.willpowerinfo.co.uk/> -----------------

Received on Thursday, 4 December 2008 15:53:46 UTC