- From: Leonard Will <L.Will@willpowerinfo.co.uk>
- Date: Tue, 11 May 2004 15:42:06 +0100
- To: public-esw-thes@w3.org
In message
<B56ABE145BEB0C40A265238FCAA420DF026F5287@oa2-server.oa.oclc.org> on
Tue, 11 May 2004, "Houghton,Andrew" <houghtoa@oclc.org> wrote
> From: Leonard Will [mailto:L.Will@willpowerinfo.co.uk]
> Sent: Tuesday, May 11, 2004 5:21 AM
>> OK, I agree that it would be useful to have a mechanism for encoding
>>pre-coordinate classification schemes and subject indexing strings, and I
>>do like the idea of treating them in an integrated way that works smoothly
>>with the encoding of thesaurus structures. It will mean a significant
>>expansion of the project, though. Is it currently within its scope?
>
>I disagree that this would be a significant expansion of the project. We
>have started to do some preliminary mapping of AAT, LCSH, MeSH, DDC,
>etc. and for the most part they seem to map into the SKOS model. Areas
>where we are currently having problems with are notes and node labels.
1. Yes, notes should certainly be dealt with as you suggest, by having a
main class of "notes" and subclasses for different kinds of notes. This
would avoid a field labelled "scope notes" having to accommodate notes
of other kinds.
2. As regards node labels, I have tried to show that we need to
distinguish between
(a) real "node labels", which specify a characteristic of
division in the form <xxx by yyy> and
(b) broader concepts which act as parent terms to the terms in a
following array.
DDC centred headings and some of the AAT guide terms fall under (b), and
should not be called node labels. Structurally these are just terms
representing concepts which the thesaurus editor has decided are
unsuitable for use in indexing (and may have to be labelled in some way
to indicate this).
3. The more complex issue that I thought would broaden the scope of the
project is handling pre-coordinated indexing strings. The problem is
described in the following extract from "FAST : development of
simplified headings for metadata / by Rebecca J. Dean"
<http://www.oclc.org/research/projects/fast/international_auth200302.doc>
"LCSH is not a true thesaurus in the sense that it is not a
comprehensive list of all valid subject headings. Rather LCSH
combines authorities, now five volumes in their printed form,
with a four-volume manual of rules detailing the requirements
for creating headings that are not established in the authority
file and for the further subdivision of the established
headings.
The rules for using free-floating subdivisions controlled by
pattern headings illustrate some of these complexities. Under
specified conditions, these free-floating subdivisions can be
added to established headings. The scope of patterns is limited
to particular types (patterns) of headings. For example, Burns
and scalds-Patients-Family relationships is a valid heading
formed by adding two pattern subdivisions to the established
heading Burns and scalds. The subdivision 'Patients' is one of
several hundred subdivisions that can be used with headings for
diseases and other medical conditions. Therefore it can be used
to subdivide Burns and scalds. However, the addition of
Patients changes the meaning of the heading from a medical
condition to a class of persons. Now, since Family
relationships is authorized under the pattern for classes of
persons, it can also be added to complete the heading."
DDC and MeSH, similarly, have many provisions for synthesising concepts
to express compound concepts that may or may not be enumerated in the
schedules. A classification schedule may show these compound concepts in
a hierarchical display, as I illustrated in the second example in my
message of 6th May, but the hierarchy is not built on the same BT/NT
relationships as in a thesaurus.
It seems to me to be a much more complex job for SKOS to try to create a
system that would incorporate rules for creating these compound strings.
Most of them don't exist until they are needed for indexing a document,
though once they are created they may be stored in an authority file so
that the same string will be used if the same compound topic arises
again in future.
The FAST project <http://www.oclc.org/research/projects/fast/> from
which I quoted above recognises this problem by treating each of the
elements of an LCSH heading separately and grouping them into subject,
time, place, form, people and organisations facets. This makes it much
more amenable to storing in a structure like SKOS, and seems the best
initial approach.
Leonard Will
--
Willpower Information (Partners: Dr Leonard D Will, Sheena E Will)
Information Management Consultants Tel: +44 (0)20 8372 0092
27 Calshot Way, Enfield, Middlesex EN2 7BQ, UK. Fax: +44 (0)870 051 7276
L.Will@Willpowerinfo.co.uk Sheena.Will@Willpowerinfo.co.uk
---------------- <URL:http://www.willpowerinfo.co.uk/> -----------------
Received on Tuesday, 11 May 2004 10:42:55 UTC