- From: Leonard Will <L.Will@willpowerinfo.co.uk>
- Date: Fri, 13 Feb 2009 12:17:12 +0000
- To: "public-esw-thes@w3.org" <public-esw-thes@w3.org>
- Message-ID: <3ITNYDLITWlJFAAq@mail.willpowerinfo.co.uk>
On Thu, 12 Feb 2009 at 21:52:39, Christophe Dupriez <christophe.dupriez@destin.be> wrote >Hi Alistair, >I was thinking like you "skos:broader to correspond perfectly well to the ISO >notion of BT" up to the moment I discovered that: > >Having all BroadMatch being Broaders (in practical terms, getBroader(s) >must return all BroadMatch and all other kind of Broaders), for me, >disqualifies Broader as being an equivalent of ISO BT. >This was discovered when making a system deemed to be operational and >publicly accessible. > >Please allow me to insist: the past investments to define a conceptual frame >(ISO standards for thesauri) and to use this frame for building indexation >and retrieval Information tools, those investments must be integrated in the >new SKOS effort. > >SKOS is not OWL: we are not looking to model precisely everything (Java >does that very well (!), OWL also); we are looking for a work frame for the >(mainly manual) creation of vocabularies to link humans and information. >This frame must be simple. Simplicity comes from design+experience. > >Has anyone looked to recent ISO efforts to improve thesauri standards? >Does SKOS in line with those efforts (keeping the difference between >concept centric versus term centric)? >Good evening! >Christophe As ISO work and standards have been referred to a few times in this discussion, perhaps I can clarify where we stand on a few of these issues. I write as Chair of the "Data Modeling, Exchange Formats and Protocols" subgroup of the ISO working group SC9WG8/Project 25964, currently revising the ISO standard for thesauri for information retrieval, but as these standards are still in draft form anything I say here is my own interpretation of the way we are going, and is not authoritative. We have developed a UML data model which incorporates the structures and principles that are recommended in the ISO standard for thesauri. I sent a copy of this model to this list on 4th December, and I attach another copy for ease of reference. The full text of the revised standard, with notes on this model, is at present in committee draft stage and regrettably cannot yet be made available for public comment. Terms and concepts ------------------------------ The ISO model is firmly based on relationships between concepts, not terms. Terms are used as labels for concepts, as in SKOS. Each concept has a single unique preferred label in each language. Homographs within a language are distinguished by adding a qualifier in parentheses to the term; these qualifiers become an integral part of the term - they are not separable and do not exist as a separate entity or class in the UML model. The ISO model represents a single, self-contained thesaurus. Each concept has a clear definition, either explicit or implicit, which determines its scope and relationships. This definition may be artificially restricted compared to the wider meaning that some of its terms may have in natural language. There is no provision for linking to external concepts by URLs or by any other means - if a good definition of a concept is found elsewhere it may be "imported" and become part of the thesaurus, but the link to its source is then just a record of its origin. I find it difficult to understand the SKOS idea of building a thesaurus from a network of URLs linking to any number of disparate sites. Which sites do you choose to represent the concepts of "war", "peace", "ships", "shoes", "sealing-wax", "cabbages" or "kings", for example? What control do you have over these external links if they change or disappear? Hierarchies and transitivity ----------------------------------- The ISO model allows two concepts to be linked by a hierarchical relationship. In the great majority of cases this is a generic relationship, corresponding to the ontologists' "is-a" relationship. It is represented, for historical reasons, as "BT/NT" although this is understood as meaning "broader concept / narrower concept" rather than "broader term / narrower term" as the abbreviation would suggest. It may be specified more precisely as a generic relationship, "BTG/NTG" or as an instantial relationship "BTI/NTI". BTI/NTI is a particular kind of BTG/NTG in which the narrower concept is a "class of one", very often labelled by a proper name. A "class of one" cannot, by its definition, have further generic subclasses, so in a generic hierarchy it can occur only at an extremity (as a "leaf" on the hierarchical tree). Generic relationships are inherently transitive. The draft ISO standard says that hierarchies can be built on part / whole relationships (BTP/NTP) in only a very few special cases, specifically mentioning social structures (such as subdivisions of an army), disciplines, place names and parts of the human body. Relationships are transitive within hierarchies built purely from these partitive relationships, but transitivity fails across a chain which mixes generic and partitive relationships. It is not therefore possible to match the SKOS expressions "broader" and "broaderTransitive" to the ISO relationships. Transitivity is not inherent in each relationship but depends on whether the chain uses the same type of relationship throughout. Mapping between thesauri --------------------------------------- The present ISO model does not address the issue of mapping between two separate thesauri, or mapping concepts within a thesaurus to external concepts, whether within a thesaurus or not. We expect this to be considered when we work on part 2 of the ISO standard, which we have not yet started (though it is covered in part 4 of the related British Standard, BS8723-4:2007). Relationships between concepts in source and target schemes fall in to the following categories: a. Exact equivalence, where the two concepts have the same scope b. Partial equivalence, where the scope of a concept in one scheme falls completely within the scope of a concept in the other scheme. The concept in the source scheme may be broader or narrower than the one in the target scheme. c. Inexact or "overlapping" equivalence, where the scopes overlap. The two concepts share some scope, but each concept covers some topics that are not included in the other. d. Non-equivalence, where there is no concept in the target scheme that contains any of the scope of the concept in the source scheme. In practice, it may also be necessary to create one-to-many matches, where a concept in the source scheme may be best represented by a Boolean combination of concepts in the source scheme. The SKOS provision of exact, broader and narrower matches does not seem to cover these categories adequately, especially in the absence of provision for overlapping matches and one-to-many matches. Arrays and concept groups -------------------------------------- The ISO model makes a clear distinction between "arrays" and "concept groups. Arrays are sets of sibling terms under a common parent, which may be grouped, and potentially ordered, by a characteristic of division specified in a node label. Concept groups are much less restricted. Many thesauri group concepts using a classification structure which exists in parallel to the hierarchies of thesaurus concepts based on BT/NT relationships. Groups created by the classification are often based on disciplines, subject areas or areas of business activity. They are sometimes called "subject categories", "themes", "domains", "groups" or "microthesauri". There is not, in general, a BT/NT relationship between a ConceptGroup and the concepts which it contains, or between the concepts in a ConceptGroup. Concepts may be gathered into ConceptGroups from many different facets or hierarchies of the thesaurus, and the notation used for the classification into groups may be quite distinct from any notation that may be used for the concepts themselves. Groups may have subgroups, being nested to any level. Each group should be given one verbal label per language. As far as I can see, SKOS does not provide for these elements of the ISO model. The use of "collections" for both of these is liable to lead to confusion and inconsistency. I am very keen that the ISO working group should work as closely as possible with SKOS so that we don't end up with two incompatible ways of representing thesauri (and later, with luck, other types of knowledge organisation scheme). We think that our model provides a practical and accurate representation of the logical structure of a thesaurus, but are open to suggestions as to any inadequacies it may have. SKOS is gaining a lot of momentum as a standard for the exchange of data in semantic applications, but I am very concerned about the incompatibilities I have outlined above. How can we best move forward? Leonard Will -- Willpower Information (Partners: Dr Leonard D Will, Sheena E Will) Information Management Consultants Tel: +44 (0)20 8372 0092 27 Calshot Way L.Will@Willpowerinfo.co.uk ENFIELD Sheena.Will@Willpowerinfo.co.uk EN2 7BQ, UK http://www.willpowerinfo.co.uk/
Attachments
- image/jpeg attachment: ISO_model_2008-11-18.jpg
Received on Friday, 13 February 2009 12:23:10 UTC