W3C home > Mailing lists > Public > public-esw-thes@w3.org > February 2007

RE: [SKOS] thesaurus USE patterns

From: Stella Dextre Clarke <sdclarke@lukehouse.demon.co.uk>
Date: Thu, 8 Feb 2007 19:27:36 -0000
To: "'Miles, AJ \(Alistair\)'" <A.J.Miles@rl.ac.uk>, <public-swd-wg@w3.org>
Cc: <public-esw-thes@w3.org>
Message-ID: <002e01c74bb7$33abe470$0300000a@DELL>

Alistair,
I've just taken a look at your statement of the issue of representing A
USE X AND Y as well as A USE X OR Y. The second half of it looks a very
fair description of the problem. But I must protest that the Background
section does not well reflect historical reality. The problem comes from
your assumptions about what the thesaurus was invented for, i.e.
"paper-based card catalogues" and the indexes derived therefrom. It
gives the impression of catalogues like library card catalogues, in
which you have a card per title (or occasionally more than one), with
lots of cataloguing data on the card.

Around the time when IR thesauri were invented, hopes were pinned on
"mechanisation" rather than "computerisation", and a lot of
experimentation went on with various sorts of cards. It is true that
some agencies did try to use thesauri with "item cards" a bit like
catalogue cards (only the more sophisticated ones were IBM punched cards
or edge-punched cards), but the more successful ones used "feature
cards". The key difference between these approaches is whether you
assign a card to the document (item) being indexed, or to the thesaurus
term (feature) that is used for indexing the documents. 

Optical coincidence cards were probably the most satisfactory sort of
card for use with the thesaurus. You had the thesaurus itself, truly
paper-based, a book that was consulted by indexers and searchers alike.
Then you had a bank of strong cards, sometimes almost 2x2 ft in size,
with a grid printed on each. Each card had the preferred term inscribed
on the top left corner and used to alphabetise them. The card could have
up to 100 rows and 100 columns, which made it capable of indexing a
collection of 10,000 documents. Each document had a number. 

On indexing document 1234 with terms "Cats" and "Fur" you would get out
those cards and punch a hole in each of them, in grid position 1234.
Much later, when someone came to search for items about cats' fur, they
would pull out those two cards and hold them up to a light. The light
would shine through the holes in  all the punched positions, including
1234. And then you went to the collection and pulled out document 1234
etc.

The process I have just described is called postcoordinate retrieval and
exactly the same principle is used for computer searches using Boolean
AND. So the scenario for which thesauri were invented was not so very
different from today's computer use. But because the thesaurus itself
was paper-based, the issue of how to manage  "A USE X AND Y" as well as
"A USE X OR Y" did not really arise. These types of entry were simply
typed on to a page (with a type-writer, if you were lucky) for use by
humans. And even then, we frowned on "A USE X OR Y" as not very good
practice.

Despite the length of this message, I repeat that the main part of your
statement is not affected by the Background. The problem with "A USE X
AND Y" is not in the type of indexing system, but in the representation
of the relationships within the thesaurus itself. And there I agree with
you - the move from paper-based management to mechanised/computerised
management does bring the problem to the fore. I'm just looking forward
to the solutions you come up with!

Cheers
Stella


*****************************************************
Stella Dextre Clarke
Information Consultant
Luke House, West Hendred, Wantage, Oxon, OX12 8RR, UK
Tel: 01235-833-298
Fax: 01235-863-298
SDClarke@LukeHouse.demon.co.uk
*****************************************************



-----Original Message-----
From: public-esw-thes-request@w3.org
[mailto:public-esw-thes-request@w3.org] On Behalf Of Miles, AJ
(Alistair)
Sent: 08 February 2007 17:21
To: public-swd-wg@w3.org
Cc: public-esw-thes@w3.org
Subject: [SKOS] thesaurus USE patterns



Hi all,

Please see the following: 

http://www.w3.org/2006/07/SWD/wiki/SkosDesign/ThesaurusPatterns?action=r
ecall&rev=4

[DONE] ACTION: Alistair to raise a new issue about USE X + Y and USE X
OR Y [recorded in
http://www.w3.org/2007/01/23-swd-minutes.html#action07]

Cheers,

Alistair.
--
Alistair Miles
Research Associate
CCLRC - Rutherford Appleton Laboratory
Building R1 Room 1.60
Fermi Avenue
Chilton
Didcot
Oxfordshire OX11 0QX
United Kingdom
Web: http://purl.org/net/aliman
Email: a.j.miles@rl.ac.uk
Tel: +44 (0)1235 445440
Received on Thursday, 8 February 2007 19:27:58 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:38:55 GMT