Differences between SKOS and ISO standards from Leonard Will on 2009-02-13 (public-esw-thes@w3.org from February 2009)

From: Leonard Will <L.Will@willpowerinfo.co.uk>
Date: Fri, 13 Feb 2009 12:17:12 +0000
To: "public-esw-thes@w3.org" <public-esw-thes@w3.org>
Message-ID: <3ITNYDLITWlJFAAq@mail.willpowerinfo.co.uk>
On Thu, 12 Feb 2009 at 21:52:39, Christophe Dupriez 
<christophe.dupriez@destin.be> wrote
>Hi Alistair,
>I was thinking like you "skos:broader to correspond perfectly well to the ISO
>notion of BT" up to the moment I discovered that:
>
>Having all BroadMatch being Broaders (in practical terms, getBroader(s)
>must return all BroadMatch and all other kind of Broaders), for me,
>disqualifies Broader as being an equivalent of ISO BT.

>This was discovered when making a system deemed to be operational and
>publicly accessible.
>
>Please allow me to insist: the past investments to define a conceptual frame
>(ISO standards for thesauri) and to use this frame for building indexation
>and retrieval Information tools, those investments must be integrated in the
>new SKOS effort.
>
>SKOS is not OWL: we are not looking to model precisely everything (Java
>does that very well (!), OWL also); we are looking for a work frame for the
>(mainly manual) creation of vocabularies to link humans and information.
>This frame must be simple. Simplicity comes from design+experience.
>
>Has anyone looked to recent ISO efforts to improve thesauri standards?
>Does SKOS in line with those efforts (keeping the difference between
>concept centric versus term centric)?
>Good evening!
>Christophe

As ISO work and standards have been referred to a few times in this 
discussion, perhaps I can clarify where we stand on a few of these 
issues. I write as Chair of the "Data Modeling, Exchange Formats and 
Protocols" subgroup of the ISO working group SC9WG8/Project 25964, 
currently revising the ISO standard for thesauri for information 
retrieval, but as these standards are still in draft form anything I say 
here is my own interpretation of the way we are going, and is not 
authoritative.

We have developed a UML data model which incorporates the structures and 
principles that are recommended in the ISO standard for  thesauri. I 
sent a copy of this model to this list on 4th December, and I attach 
another copy for ease of reference. The full text of the revised 
standard, with notes on this model, is at present in committee draft 
stage and regrettably cannot yet be made available for public comment.

Terms and concepts
------------------------------
The ISO model is firmly based on relationships between concepts, not 
terms. Terms are used as labels for concepts, as in SKOS. Each concept 
has a single unique preferred label in each language. Homographs within 
a language are distinguished by adding a qualifier in parentheses to the 
term; these qualifiers become an integral part of the term - they are 
not separable and do not exist as a separate entity or class in the UML 
model.

The ISO model represents a single, self-contained thesaurus. Each 
concept has a clear definition, either explicit or implicit, which 
determines its scope and relationships. This definition may be 
artificially restricted compared to the wider meaning that some of its 
terms may have in natural language. There is no provision for linking to 
external concepts by URLs or by any other means - if a good definition 
of a concept is found elsewhere it may be "imported" and become part of 
the thesaurus, but the link to its source is then just a record of its 
origin.

I find it difficult to understand the SKOS idea of building a thesaurus 
from a network of URLs linking to any number of disparate sites. Which 
sites do you choose to represent the concepts of "war", "peace", 
"ships", "shoes", "sealing-wax", "cabbages" or "kings", for example? 
What control do you have over these external links if they change or 
disappear?

Hierarchies and transitivity
-----------------------------------
The ISO model allows two concepts to be linked by a hierarchical 
relationship. In the great majority of cases this is a generic 
relationship, corresponding to the ontologists' "is-a" relationship. It 
is represented, for historical reasons, as "BT/NT" although this is 
understood as meaning "broader concept / narrower concept" rather than 
"broader term / narrower term" as the abbreviation would suggest.

It may be specified more precisely as a generic relationship, "BTG/NTG" 
or as an instantial relationship "BTI/NTI". BTI/NTI is a particular kind 
of BTG/NTG in which the narrower concept is a "class of one", very often 
labelled by a proper name. A "class of one" cannot, by its definition, 
have further generic subclasses, so in a generic hierarchy it can occur 
only at an extremity (as a "leaf" on the hierarchical tree). Generic 
relationships are inherently transitive.

The draft ISO standard says that hierarchies can be built on part / 
whole relationships (BTP/NTP) in only a very few special cases, 
specifically mentioning social structures (such as subdivisions of an 
army), disciplines, place names and parts of the human body. 
Relationships are transitive within hierarchies built purely from these 
partitive relationships, but transitivity fails across a chain which 
mixes generic and partitive relationships.

It is not therefore possible to match the SKOS expressions "broader" and 
"broaderTransitive" to the ISO relationships. Transitivity is not 
inherent in each relationship but depends on whether the chain uses the 
same type of relationship throughout.

  Mapping between thesauri
---------------------------------------
The present ISO model does not address the issue of mapping between two 
separate thesauri, or mapping concepts within a thesaurus to external 
concepts, whether within a thesaurus or not. We expect this to be 
considered when we work on part 2 of the ISO standard, which we have not 
yet started (though it is covered in part 4 of the related British 
Standard, BS8723-4:2007).

Relationships between concepts in source and target schemes fall in to 
the following categories:

a. Exact equivalence, where the two concepts have the same scope

b. Partial equivalence, where the scope of a concept in one scheme falls 
completely within the scope of a concept in the other scheme. The 
concept in the source scheme may be broader or narrower than the one in 
the target scheme.

c. Inexact or "overlapping" equivalence, where the scopes overlap. The 
two concepts share some scope, but each concept covers some topics that 
are not included in the other.

d. Non-equivalence, where there is no concept in the target scheme that 
contains any of the scope of the concept in the source scheme.

In practice, it may also be necessary to create one-to-many matches, 
where a concept in the source scheme may be best represented by a 
Boolean combination of concepts in the source scheme.

The SKOS provision of exact, broader and narrower matches does not seem 
to cover these categories adequately, especially in the absence of 
provision for overlapping matches and one-to-many matches.

Arrays and concept groups
--------------------------------------
The ISO model makes a clear distinction between "arrays" and "concept 
groups.

Arrays are sets of sibling terms under a common parent, which may be 
grouped, and potentially ordered, by a characteristic of division 
specified in a node label.

Concept groups are much less restricted. Many thesauri group concepts 
using a classification structure which exists in parallel to the 
hierarchies of thesaurus concepts based on BT/NT relationships. Groups 
created by the classification are often based on disciplines, subject 
areas or areas of business activity. They are sometimes called "subject 
categories", "themes", "domains", "groups" or "microthesauri". There is 
not, in general, a BT/NT relationship between a ConceptGroup and the 
concepts which it contains, or between the concepts in a ConceptGroup. 
Concepts may be gathered into ConceptGroups from many different facets 
or hierarchies of the thesaurus, and the notation used for the 
classification into groups may be quite distinct from any notation that 
may be used for the concepts themselves. Groups may have subgroups, 
being nested to any level. Each group should be given one verbal label 
per language.

As far as I can see, SKOS does not provide for these elements of the ISO 
model. The use of "collections" for both of these is liable to lead to 
confusion and inconsistency.


I am very keen that the ISO working group should work as closely as 
possible with SKOS so that we don't end up with two incompatible ways of 
representing thesauri (and later, with luck, other types of knowledge 
organisation scheme).  We think that our model provides a practical and 
accurate representation of the logical structure of a thesaurus, but are 
open to suggestions as to any inadequacies it may have. SKOS is gaining 
a lot of momentum as a standard for the exchange of data in semantic 
applications, but I am very concerned about the incompatibilities I have 
outlined above.

How can we best move forward?

Leonard Will

-- 
Willpower Information     (Partners: Dr Leonard D Will, Sheena E Will)
Information Management Consultants            Tel: +44 (0)20 8372 0092
27 Calshot Way                              L.Will@Willpowerinfo.co.uk
ENFIELD                                Sheena.Will@Willpowerinfo.co.uk
EN2 7BQ, UK                            http://www.willpowerinfo.co.uk/
Attachments

image/jpeg attachment: ISO_model_2008-11-18.jpg
Received on Friday, 13 February 2009 12:23:10 UTC