Re: Supporting arrays of concepts

In message
<B56ABE145BEB0C40A265238FCAA420DF026F5287@oa2-server.oa.oclc.org> on
Tue, 11 May 2004, "Houghton,Andrew" <houghtoa@oclc.org> wrote

> From: Leonard Will [mailto:L.Will@willpowerinfo.co.uk]
> Sent: Tuesday, May 11, 2004 5:21 AM

>> OK, I agree that it would be useful to have a mechanism for  encoding
>>pre-coordinate classification schemes and subject  indexing strings, and I
>>do like the idea of treating them in  an integrated way that works smoothly
>>with the encoding of  thesaurus structures.  It will mean a significant
>>expansion  of the project, though. Is it currently within its scope?
>
>I disagree that this would be a significant expansion of the project.  We
>have started to do some preliminary mapping of AAT, LCSH, MeSH, DDC,
>etc. and for the most part they seem to map into the SKOS model.  Areas
>where we are currently having problems with are notes and node labels.

1. Yes, notes should certainly be dealt with as you suggest, by having a
main class of "notes" and subclasses for different kinds of notes. This
would avoid a field labelled "scope notes" having to accommodate notes
of other kinds.

2. As regards node labels, I have tried to show that we need to
distinguish between

        (a) real "node labels", which specify a characteristic of
        division in the form <xxx by yyy> and

        (b) broader concepts which act as parent terms to the terms in a
        following array.

DDC centred headings and some of the AAT guide terms fall under (b), and
should not be called node labels. Structurally these are just terms
representing concepts which the thesaurus editor has decided are
unsuitable for use in indexing (and may have to be labelled in some way
to indicate this).

3. The more complex issue that I thought would broaden the scope of the
project is handling pre-coordinated indexing strings. The problem is
described in the following extract from "FAST : development of
simplified headings for metadata / by Rebecca J. Dean"
<http://www.oclc.org/research/projects/fast/international_auth200302.doc>

        "LCSH is not a true thesaurus in the sense that it is not a
        comprehensive list of all valid subject headings.  Rather LCSH
        combines authorities, now five volumes in their printed form,
        with a four-volume manual of rules detailing the requirements
        for creating headings that are not established in the authority
        file and for the further subdivision of the established
        headings.

        The rules for using free-floating subdivisions controlled by
        pattern headings illustrate some of these complexities.  Under
        specified conditions, these free-floating subdivisions can be
        added to established headings.  The scope of patterns is limited
        to particular types (patterns) of headings.  For example, Burns
        and scalds-Patients-Family relationships is a valid heading
        formed by adding two pattern subdivisions to the established
        heading Burns and scalds.  The subdivision 'Patients' is one of
        several hundred subdivisions that can be used with headings for
        diseases and other medical conditions.  Therefore it can be used
        to subdivide Burns and scalds.  However, the addition of
        Patients changes the meaning of the heading from a medical
        condition to a class of persons.  Now, since Family
        relationships is authorized under the pattern for classes of
        persons, it can also be added to complete the heading."

DDC and MeSH, similarly, have many provisions for synthesising concepts
to express compound concepts that may or may not be enumerated in the
schedules. A classification schedule may show these compound concepts in
a hierarchical display, as I illustrated in the second example in my
message of 6th May, but the hierarchy is not built on the same BT/NT
relationships as in a thesaurus.

It seems to me to be a much more complex job for SKOS to try to create a
system that would incorporate rules for creating these compound strings.
Most of them don't exist until they are needed for indexing a document,
though once they are created they may be stored in an authority file so
that the same string will be used if the same compound topic arises
again in future.

The FAST project <http://www.oclc.org/research/projects/fast/> from
which I quoted above recognises this problem by treating each of the
elements of an LCSH heading separately and grouping them into subject,
time, place, form, people and organisations facets. This makes it much
more amenable to storing in a structure like SKOS, and seems the best
initial approach.

Leonard Will

-- 
Willpower Information       (Partners: Dr Leonard D Will, Sheena E Will)
Information Management Consultants              Tel: +44 (0)20 8372 0092
27 Calshot Way, Enfield, Middlesex EN2 7BQ, UK. Fax: +44 (0)870 051 7276
L.Will@Willpowerinfo.co.uk               Sheena.Will@Willpowerinfo.co.uk
---------------- <URL:http://www.willpowerinfo.co.uk/> -----------------

Received on Tuesday, 11 May 2004 10:42:55 UTC