RE: Supporting arrays of concepts from Houghton,Andrew on 2004-05-11 (public-esw-thes@w3.org from May 2004)

From: Houghton,Andrew <houghtoa@oclc.org>
Date: Tue, 11 May 2004 13:05:08 -0400
To: public-esw-thes@w3.org
Message-ID: <B56ABE145BEB0C40A265238FCAA420DF026F52BE@oa2-server.oa.oclc.org>
> From: Leonard Will [mailto:L.Will@willpowerinfo.co.uk] 
> Sent: Tuesday, May 11, 2004 10:43 AM
> Subject: Re: Supporting arrays of concepts
> 
> 2. As regards node labels, I have tried to show that we need 
> to distinguish between
> 
>         (a) real "node labels", which specify a characteristic of
>         division in the form <xxx by yyy> and
> 
>         (b) broader concepts which act as parent terms to the 
> terms in a
>         following array.
> 
> DDC centred headings and some of the AAT guide terms fall 
> under (b), and should not be called node labels. Structurally 
> these are just terms representing concepts which the 
> thesaurus editor has decided are unsuitable for use in 
> indexing (and may have to be labelled in some way to indicate this).

I think I agree with your idea of separating the two.  Maybe what is
needed is another element at the same level as skos:Concept, perhaps
skos:Summary, that handles (b) and the current proposal for handling
(a).  Although the current proposal seems odd to me.  It seems to me
that you might want to have additional metadata associated with node
label array in addition to the list of concepts associated with it.
For example scope notes or other types of notes.

> 
> 3. The more complex issue that I thought would broaden the 
> scope of the project is handling pre-coordinated indexing 
> strings. The problem is described in the following extract 
> from "FAST : development of simplified headings for metadata 
> / by Rebecca J. Dean"
>
> [quote deleted]
> 
> DDC and MeSH, similarly, have many provisions for 
> synthesising concepts to express compound concepts that may 
> or may not be enumerated in the schedules. A classification 
> schedule may show these compound concepts in a hierarchical 
> display, as I illustrated in the second example in my message 
> of 6th May, but the hierarchy is not built on the same BT/NT 
> relationships as in a thesaurus.

I guess I didn't quite follow that.  Yes, LCC, DDC, LCSH, MeSH and
others allow synthesising concepts, but those concepts are still
valid from the vocabularies perspective, its just that they have
not been *enumerated* as a standard thesaurus does.  So lets say
I convert DDC into SKOS.  What you would get is all the predefined
concepts defined by the Dewey editors.  If someone builds a class
number, based upon the instructions in the classification, then
they can merely create an skos:Concept element and within that
element use the skos:inScheme to point to the "official" base
scheme which defines the list of predefined concepts.  Any built
concept in LCC, DDC, LCSH, or MeSH participate in same BT/NT 
relationship established for the vocabulary.  So I seem to be 
missing something with your analogy.

> It seems to me to be a much more complex job for SKOS to try 
> to create a system that would incorporate rules for creating 
> these compound strings.

You don't need to incorporate the rules for creating the compound
strings.  The "whole" compound string *is* the concept and there
isn't necessarily a BT/NT relationship between the predefined
part and what was composed.  The whole term should be taken as
the concept and its BT/NT relationship is to be taken in the
context of all the other predefined or compound strings in the
vocabulary.

> The FAST project 
> <http://www.oclc.org/research/projects/fast/> from which I 
> quoted above recognises this problem by treating each of the 
> elements of an LCSH heading separately and grouping them into 
> subject, time, place, form, people and organisations facets. 
> This makes it much more amenable to storing in a structure 
> like SKOS, and seems the best initial approach.

I disagree with this statement on several fronts.  FAST is no
more amenable to SKOS than any of the other vocabularies I
mentioned.  As a matter of fact I forgot to include FAST in
the list of vocabularies that we will probably put into SKOS.
FAST actually doesn't treat LCSH headings much differently than
LCSH already does.  LCSH is already faceted!  LC may disagree
with my statement, but the simple fact, or facet, of the matter
is when we look at LCSH, which is defined by MARC21, the preferred
term of the vocabulary is specified as the 1XX field.  That XX
should be a clue.  Looking at the MARC21-A authorities format,
you can see the following 1XX definitions:

    * 100 - HEADING--PERSONAL NAME (NR)
    * 110 - HEADING--CORPORATE NAME (NR)
    * 111 - HEADING--MEETING NAME (NR)
    * 130 - HEADING--UNIFORM TITLE (NR)
    * 148 - HEADING--CHRONOLOGICAL TERM (NR)
    * 150 - HEADING--TOPICAL TERM (NR)
    * 151 - HEADING--GEOGRAPHIC NAME (NR)
    * 155 - HEADING--GENRE/FORM TERM (NR)
    * 180 - HEADING--GENERAL SUBDIVISION (NR)
    * 181 - HEADING--GEOGRAPHIC SUBDIVISION (NR)
    * 182 - HEADING--CHRONOLOGICAL SUBDIVISION (NR)
    * 185 - HEADING--FORM SUBDIVISION (NR)

To the naive, those pretty much look like facets.  The 18X are
used in building composed subject headings.  So the "real" facets
are everything else.

FAST doesn't do anything radically different.  They basically use
the similar "facets" as LCSH has, but pull out some sub-facets under
the existing LCSH pseudo-facets.


Andy.

Andrew Houghton, OCLC Online Computer Library Center, Inc.
http://www.oclc.org/about/
http://www.oclc.org/research/staff/houghton.htm
Received on Tuesday, 11 May 2004 13:05:45 UTC