W3C home > Mailing lists > Public > public-swd-wg@w3.org > February 2007

Re: [SKOS] thesaurus USE patterns

From: Alistair Miles <a.j.miles@rl.ac.uk>
Date: Thu, 08 Feb 2007 19:47:03 +0000
Message-ID: <45CB7E37.8040403@rl.ac.uk>
To: Stella Dextre Clarke <sdclarke@lukehouse.demon.co.uk>
CC: public-swd-wg@w3.org, public-esw-thes@w3.org

I've added a caveat and a link to Stella's entirely superior 
background information:

http://www.w3.org/2006/07/SWD/wiki/SkosDesign/ThesaurusPatterns?action=recall&rev=6

Thanks Stella :)

Alistair.

Stella Dextre Clarke wrote:
> Alistair,
> I've just taken a look at your statement of the issue of representing A
> USE X AND Y as well as A USE X OR Y. The second half of it looks a very
> fair description of the problem. But I must protest that the Background
> section does not well reflect historical reality. The problem comes from
> your assumptions about what the thesaurus was invented for, i.e.
> "paper-based card catalogues" and the indexes derived therefrom. It
> gives the impression of catalogues like library card catalogues, in
> which you have a card per title (or occasionally more than one), with
> lots of cataloguing data on the card.
> 
> Around the time when IR thesauri were invented, hopes were pinned on
> "mechanisation" rather than "computerisation", and a lot of
> experimentation went on with various sorts of cards. It is true that
> some agencies did try to use thesauri with "item cards" a bit like
> catalogue cards (only the more sophisticated ones were IBM punched cards
> or edge-punched cards), but the more successful ones used "feature
> cards". The key difference between these approaches is whether you
> assign a card to the document (item) being indexed, or to the thesaurus
> term (feature) that is used for indexing the documents.
> 
> Optical coincidence cards were probably the most satisfactory sort of
> card for use with the thesaurus. You had the thesaurus itself, truly
> paper-based, a book that was consulted by indexers and searchers alike.
> Then you had a bank of strong cards, sometimes almost 2x2 ft in size,
> with a grid printed on each. Each card had the preferred term inscribed
> on the top left corner and used to alphabetise them. The card could have
> up to 100 rows and 100 columns, which made it capable of indexing a
> collection of 10,000 documents. Each document had a number.
> 
> On indexing document 1234 with terms "Cats" and "Fur" you would get out
> those cards and punch a hole in each of them, in grid position 1234.
> Much later, when someone came to search for items about cats' fur, they
> would pull out those two cards and hold them up to a light. The light
> would shine through the holes in  all the punched positions, including
> 1234. And then you went to the collection and pulled out document 1234
> etc.
> 
> The process I have just described is called postcoordinate retrieval and
> exactly the same principle is used for computer searches using Boolean
> AND. So the scenario for which thesauri were invented was not so very
> different from today's computer use. But because the thesaurus itself
> was paper-based, the issue of how to manage  "A USE X AND Y" as well as
> "A USE X OR Y" did not really arise. These types of entry were simply
> typed on to a page (with a type-writer, if you were lucky) for use by
> humans. And even then, we frowned on "A USE X OR Y" as not very good
> practice.
> 
> Despite the length of this message, I repeat that the main part of your
> statement is not affected by the Background. The problem with "A USE X
> AND Y" is not in the type of indexing system, but in the representation
> of the relationships within the thesaurus itself. And there I agree with
> you - the move from paper-based management to mechanised/computerised
> management does bring the problem to the fore. I'm just looking forward
> to the solutions you come up with!
> 
> Cheers
> Stella
> 
> 
> *****************************************************
> Stella Dextre Clarke
> Information Consultant
> Luke House, West Hendred, Wantage, Oxon, OX12 8RR, UK
> Tel: 01235-833-298
> Fax: 01235-863-298
> SDClarke@LukeHouse.demon.co.uk
> *****************************************************
> 
> 
> 
> -----Original Message-----
> From: public-esw-thes-request@w3.org
> [mailto:public-esw-thes-request@w3.org] On Behalf Of Miles, AJ
> (Alistair)
> Sent: 08 February 2007 17:21
> To: public-swd-wg@w3.org
> Cc: public-esw-thes@w3.org
> Subject: [SKOS] thesaurus USE patterns
> 
> 
> 
> Hi all,
> 
> Please see the following:
> 
> http://www.w3.org/2006/07/SWD/wiki/SkosDesign/ThesaurusPatterns?action=r
> ecall&rev=4
> 
> [DONE] ACTION: Alistair to raise a new issue about USE X + Y and USE X
> OR Y [recorded in
> http://www.w3.org/2007/01/23-swd-minutes.html#action07]
> 
> Cheers,
> 
> Alistair.
> --
> Alistair Miles
> Research Associate
> CCLRC - Rutherford Appleton Laboratory
> Building R1 Room 1.60
> Fermi Avenue
> Chilton
> Didcot
> Oxfordshire OX11 0QX
> United Kingdom
> Web: http://purl.org/net/aliman
> Email: a.j.miles@rl.ac.uk
> Tel: +44 (0)1235 445440
> 
> 
> 
> 

-- 
Alistair Miles
Research Associate
CCLRC - Rutherford Appleton Laboratory
Building R1 Room 1.60
Fermi Avenue
Chilton
Didcot
Oxfordshire OX11 0QX
United Kingdom
Web: http://purl.org/net/aliman
Email: a.j.miles@rl.ac.uk
Tel: +44 (0)1235 445440
Received on Thursday, 8 February 2007 19:47:25 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:17:28 GMT