- From: Leonard Will <L.Will@willpowerinfo.co.uk>
- Date: Tue, 11 May 2004 15:42:06 +0100
- To: public-esw-thes@w3.org
In message <B56ABE145BEB0C40A265238FCAA420DF026F5287@oa2-server.oa.oclc.org> on Tue, 11 May 2004, "Houghton,Andrew" <houghtoa@oclc.org> wrote > From: Leonard Will [mailto:L.Will@willpowerinfo.co.uk] > Sent: Tuesday, May 11, 2004 5:21 AM >> OK, I agree that it would be useful to have a mechanism for encoding >>pre-coordinate classification schemes and subject indexing strings, and I >>do like the idea of treating them in an integrated way that works smoothly >>with the encoding of thesaurus structures. It will mean a significant >>expansion of the project, though. Is it currently within its scope? > >I disagree that this would be a significant expansion of the project. We >have started to do some preliminary mapping of AAT, LCSH, MeSH, DDC, >etc. and for the most part they seem to map into the SKOS model. Areas >where we are currently having problems with are notes and node labels. 1. Yes, notes should certainly be dealt with as you suggest, by having a main class of "notes" and subclasses for different kinds of notes. This would avoid a field labelled "scope notes" having to accommodate notes of other kinds. 2. As regards node labels, I have tried to show that we need to distinguish between (a) real "node labels", which specify a characteristic of division in the form <xxx by yyy> and (b) broader concepts which act as parent terms to the terms in a following array. DDC centred headings and some of the AAT guide terms fall under (b), and should not be called node labels. Structurally these are just terms representing concepts which the thesaurus editor has decided are unsuitable for use in indexing (and may have to be labelled in some way to indicate this). 3. The more complex issue that I thought would broaden the scope of the project is handling pre-coordinated indexing strings. The problem is described in the following extract from "FAST : development of simplified headings for metadata / by Rebecca J. Dean" <http://www.oclc.org/research/projects/fast/international_auth200302.doc> "LCSH is not a true thesaurus in the sense that it is not a comprehensive list of all valid subject headings. Rather LCSH combines authorities, now five volumes in their printed form, with a four-volume manual of rules detailing the requirements for creating headings that are not established in the authority file and for the further subdivision of the established headings. The rules for using free-floating subdivisions controlled by pattern headings illustrate some of these complexities. Under specified conditions, these free-floating subdivisions can be added to established headings. The scope of patterns is limited to particular types (patterns) of headings. For example, Burns and scalds-Patients-Family relationships is a valid heading formed by adding two pattern subdivisions to the established heading Burns and scalds. The subdivision 'Patients' is one of several hundred subdivisions that can be used with headings for diseases and other medical conditions. Therefore it can be used to subdivide Burns and scalds. However, the addition of Patients changes the meaning of the heading from a medical condition to a class of persons. Now, since Family relationships is authorized under the pattern for classes of persons, it can also be added to complete the heading." DDC and MeSH, similarly, have many provisions for synthesising concepts to express compound concepts that may or may not be enumerated in the schedules. A classification schedule may show these compound concepts in a hierarchical display, as I illustrated in the second example in my message of 6th May, but the hierarchy is not built on the same BT/NT relationships as in a thesaurus. It seems to me to be a much more complex job for SKOS to try to create a system that would incorporate rules for creating these compound strings. Most of them don't exist until they are needed for indexing a document, though once they are created they may be stored in an authority file so that the same string will be used if the same compound topic arises again in future. The FAST project <http://www.oclc.org/research/projects/fast/> from which I quoted above recognises this problem by treating each of the elements of an LCSH heading separately and grouping them into subject, time, place, form, people and organisations facets. This makes it much more amenable to storing in a structure like SKOS, and seems the best initial approach. Leonard Will -- Willpower Information (Partners: Dr Leonard D Will, Sheena E Will) Information Management Consultants Tel: +44 (0)20 8372 0092 27 Calshot Way, Enfield, Middlesex EN2 7BQ, UK. Fax: +44 (0)870 051 7276 L.Will@Willpowerinfo.co.uk Sheena.Will@Willpowerinfo.co.uk ---------------- <URL:http://www.willpowerinfo.co.uk/> -----------------
Received on Tuesday, 11 May 2004 10:42:55 UTC