W3C home > Mailing lists > Public > public-esw-thes@w3.org > February 2009

Re: Differences between SKOS and ISO standards

From: Alistair Miles <alistair.miles@zoo.ox.ac.uk>
Date: Fri, 13 Feb 2009 17:26:17 +0000
To: Leonard Will <L.Will@willpowerinfo.co.uk>
Cc: "public-esw-thes@w3.org" <public-esw-thes@w3.org>
Message-ID: <20090213172617.GB18001@skiathos>

Hi Leonard,

Thank you for this detailed summary, this is extremely valuable.

Let me first say that the request to transition to candidate
recommendation will be made for SKOS very shortly. It would have been
made already, but was postponed to allow the W3C I18N activity time to
comment on the spec (they had not been previously notified of last
call publication due to an oversight). However, with the exception of
any comments that I18N may raise, the working group believes it has
satisfied all of the comments that were made during the last call
comment period. The WG does not intend to consider any substantive
changes to the specification prior to publication as a W3C
Recommendation. Future working groups may of course revisit SKOS and
publish revised recommendations in the future.

So as far as proceeding with the ISO data modeling activity, I suggest
that you proceed and remain focused on meeting a set of clearly
articulated requirements for your work. If in meeting those
requirements you may also increase the level of concordance between
the ISO data model and the SKOS data model, so much the better.

Personally, I am not concerned about the current state of the SKOS and
ISO data models. There is not a perfect alignment, but there is enough
overlap to satisfy a broad and interesting set of problems. I find the
overall degree of convergence to be very encouraging.

I believe the only area in which there is currently any substantive
difference is in the modeling of collections/arrays/groups of
concepts. It is only very recently that your work on the ISO data
model has helped to tease out some of the tricky data modeling issues
and ambiguities underlying this feature, that have remained implicit
until now. I hope that the SKOS collections framework may be
compatible with the ISO data model via some community extensions and
usage conventions, but this will only be borne out through substantial
implementation experience, which will take time. I therefore believe
this question should not delay the current publication of SKOS as a
W3C Recommendation. Future working groups already have clear
indications that this is an area for future attention, via the issues
that were raised and discussed within the SWDWG.

Kind regards,

Alistair

On Fri, Feb 13, 2009 at 12:17:12PM +0000, Leonard Will wrote:
> On Thu, 12 Feb 2009 at 21:52:39, Christophe Dupriez  
> <christophe.dupriez@destin.be> wrote
>> Hi Alistair,
>> I was thinking like you "skos:broader to correspond perfectly well to the ISO
>> notion of BT" up to the moment I discovered that:
>>
>> Having all BroadMatch being Broaders (in practical terms, getBroader(s)
>> must return all BroadMatch and all other kind of Broaders), for me,
>> disqualifies Broader as being an equivalent of ISO BT.
>
>> This was discovered when making a system deemed to be operational and
>> publicly accessible.
>>
>> Please allow me to insist: the past investments to define a conceptual frame
>> (ISO standards for thesauri) and to use this frame for building indexation
>> and retrieval Information tools, those investments must be integrated in the
>> new SKOS effort.
>>
>> SKOS is not OWL: we are not looking to model precisely everything (Java
>> does that very well (!), OWL also); we are looking for a work frame for the
>> (mainly manual) creation of vocabularies to link humans and information.
>> This frame must be simple. Simplicity comes from design+experience.
>>
>> Has anyone looked to recent ISO efforts to improve thesauri standards?
>> Does SKOS in line with those efforts (keeping the difference between
>> concept centric versus term centric)?
>> Good evening!
>> Christophe
>
> As ISO work and standards have been referred to a few times in this  
> discussion, perhaps I can clarify where we stand on a few of these  
> issues. I write as Chair of the "Data Modeling, Exchange Formats and  
> Protocols" subgroup of the ISO working group SC9WG8/Project 25964,  
> currently revising the ISO standard for thesauri for information  
> retrieval, but as these standards are still in draft form anything I say  
> here is my own interpretation of the way we are going, and is not  
> authoritative.
>
> We have developed a UML data model which incorporates the structures and  
> principles that are recommended in the ISO standard for  thesauri. I  
> sent a copy of this model to this list on 4th December, and I attach  
> another copy for ease of reference. The full text of the revised  
> standard, with notes on this model, is at present in committee draft  
> stage and regrettably cannot yet be made available for public comment.
>
> Terms and concepts
> ------------------------------
> The ISO model is firmly based on relationships between concepts, not  
> terms. Terms are used as labels for concepts, as in SKOS. Each concept  
> has a single unique preferred label in each language. Homographs within  
> a language are distinguished by adding a qualifier in parentheses to the  
> term; these qualifiers become an integral part of the term - they are  
> not separable and do not exist as a separate entity or class in the UML  
> model.
>
> The ISO model represents a single, self-contained thesaurus. Each  
> concept has a clear definition, either explicit or implicit, which  
> determines its scope and relationships. This definition may be  
> artificially restricted compared to the wider meaning that some of its  
> terms may have in natural language. There is no provision for linking to  
> external concepts by URLs or by any other means - if a good definition  
> of a concept is found elsewhere it may be "imported" and become part of  
> the thesaurus, but the link to its source is then just a record of its  
> origin.
>
> I find it difficult to understand the SKOS idea of building a thesaurus  
> from a network of URLs linking to any number of disparate sites. Which  
> sites do you choose to represent the concepts of "war", "peace",  
> "ships", "shoes", "sealing-wax", "cabbages" or "kings", for example?  
> What control do you have over these external links if they change or  
> disappear?
>
> Hierarchies and transitivity
> -----------------------------------
> The ISO model allows two concepts to be linked by a hierarchical  
> relationship. In the great majority of cases this is a generic  
> relationship, corresponding to the ontologists' "is-a" relationship. It  
> is represented, for historical reasons, as "BT/NT" although this is  
> understood as meaning "broader concept / narrower concept" rather than  
> "broader term / narrower term" as the abbreviation would suggest.
>
> It may be specified more precisely as a generic relationship, "BTG/NTG"  
> or as an instantial relationship "BTI/NTI". BTI/NTI is a particular kind  
> of BTG/NTG in which the narrower concept is a "class of one", very often  
> labelled by a proper name. A "class of one" cannot, by its definition,  
> have further generic subclasses, so in a generic hierarchy it can occur  
> only at an extremity (as a "leaf" on the hierarchical tree). Generic  
> relationships are inherently transitive.
>
> The draft ISO standard says that hierarchies can be built on part /  
> whole relationships (BTP/NTP) in only a very few special cases,  
> specifically mentioning social structures (such as subdivisions of an  
> army), disciplines, place names and parts of the human body.  
> Relationships are transitive within hierarchies built purely from these  
> partitive relationships, but transitivity fails across a chain which  
> mixes generic and partitive relationships.
>
> It is not therefore possible to match the SKOS expressions "broader" and  
> "broaderTransitive" to the ISO relationships. Transitivity is not  
> inherent in each relationship but depends on whether the chain uses the  
> same type of relationship throughout.
>
>  Mapping between thesauri
> ---------------------------------------
> The present ISO model does not address the issue of mapping between two  
> separate thesauri, or mapping concepts within a thesaurus to external  
> concepts, whether within a thesaurus or not. We expect this to be  
> considered when we work on part 2 of the ISO standard, which we have not  
> yet started (though it is covered in part 4 of the related British  
> Standard, BS8723-4:2007).
>
> Relationships between concepts in source and target schemes fall in to  
> the following categories:
>
> a. Exact equivalence, where the two concepts have the same scope
>
> b. Partial equivalence, where the scope of a concept in one scheme falls  
> completely within the scope of a concept in the other scheme. The  
> concept in the source scheme may be broader or narrower than the one in  
> the target scheme.
>
> c. Inexact or "overlapping" equivalence, where the scopes overlap. The  
> two concepts share some scope, but each concept covers some topics that  
> are not included in the other.
>
> d. Non-equivalence, where there is no concept in the target scheme that  
> contains any of the scope of the concept in the source scheme.
>
> In practice, it may also be necessary to create one-to-many matches,  
> where a concept in the source scheme may be best represented by a  
> Boolean combination of concepts in the source scheme.
>
> The SKOS provision of exact, broader and narrower matches does not seem  
> to cover these categories adequately, especially in the absence of  
> provision for overlapping matches and one-to-many matches.
>
> Arrays and concept groups
> --------------------------------------
> The ISO model makes a clear distinction between "arrays" and "concept  
> groups.
>
> Arrays are sets of sibling terms under a common parent, which may be  
> grouped, and potentially ordered, by a characteristic of division  
> specified in a node label.
>
> Concept groups are much less restricted. Many thesauri group concepts  
> using a classification structure which exists in parallel to the  
> hierarchies of thesaurus concepts based on BT/NT relationships. Groups  
> created by the classification are often based on disciplines, subject  
> areas or areas of business activity. They are sometimes called "subject  
> categories", "themes", "domains", "groups" or "microthesauri". There is  
> not, in general, a BT/NT relationship between a ConceptGroup and the  
> concepts which it contains, or between the concepts in a ConceptGroup.  
> Concepts may be gathered into ConceptGroups from many different facets  
> or hierarchies of the thesaurus, and the notation used for the  
> classification into groups may be quite distinct from any notation that  
> may be used for the concepts themselves. Groups may have subgroups,  
> being nested to any level. Each group should be given one verbal label  
> per language.
>
> As far as I can see, SKOS does not provide for these elements of the ISO  
> model. The use of "collections" for both of these is liable to lead to  
> confusion and inconsistency.
>
>
> I am very keen that the ISO working group should work as closely as  
> possible with SKOS so that we don't end up with two incompatible ways of  
> representing thesauri (and later, with luck, other types of knowledge  
> organisation scheme).  We think that our model provides a practical and  
> accurate representation of the logical structure of a thesaurus, but are  
> open to suggestions as to any inadequacies it may have. SKOS is gaining  
> a lot of momentum as a standard for the exchange of data in semantic  
> applications, but I am very concerned about the incompatibilities I have  
> outlined above.
>
> How can we best move forward?
>
> Leonard Will
>
> -- 
> Willpower Information     (Partners: Dr Leonard D Will, Sheena E Will)
> Information Management Consultants            Tel: +44 (0)20 8372 0092
> 27 Calshot Way                              L.Will@Willpowerinfo.co.uk
> ENFIELD                                Sheena.Will@Willpowerinfo.co.uk
> EN2 7BQ, UK                            http://www.willpowerinfo.co.uk/
>



-- 
Alistair Miles
Senior Computing Officer
Image Bioinformatics Research Group
Department of Zoology
The Tinbergen Building
University of Oxford
South Parks Road
Oxford
OX1 3PS
United Kingdom
Web: http://purl.org/net/aliman
Email: alistair.miles@zoo.ox.ac.uk
Tel: +44 (0)1865 281993
Received on Friday, 13 February 2009 17:26:54 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:39:03 GMT