W3C home > Mailing lists > Public > public-esw-thes@w3.org > October 2005

RE: pre- and post- coordinate indexing

From: aida <aida@acorweb.net>
Date: Wed, 19 Oct 2005 14:03:24 +0100
To: <public-esw-thes@w3.org>
Message-ID: <PLEEIBFKBJOOFMILDALDGEBOCJAA.aida@acorweb.net>

Al,
 it may help if you think of pre- an post- coordinated systems as being
related to the actual PROCESS of
indexing. Thus
- pre-coordinated systems (like subject heading systems and classifications)
combine terms in the process of indexing (metadata population). These
systems have
syntax rules to establish the exact order of terms. Different order of
terms may imply different meaning (e.g. 'bibliography of encyclopaedia'
as opposed to 'encyclopaedia of bibliography')
- post-coordinated systems (such as keywords, descriptor systems or
thesauri)
allow combination of terms only in the process of retrieval  (for instance
no order/relation
would be established between terms 'bibliography', 'encyclopaedia')

Searching depends on how a retrieval system is implemented.
In theory, the first (pre-coordinated) system could be searched both as a
'phrase' and
using Booleans.
In the second, post-coordinate, system - search precision
will go only as far as Booleans

One should not confuse a compound index term 'cut flowers' (which is
actually a single
indexing term) with pre-coordinated system which relates single indexing
terms (simple or compound)
into more complex syntactical expressions in which the order of terms
determines the meaning

Leonard is probably going to explain this a bit better but this may help for
now

aida

-----Original Message-----
From: public-esw-thes-request@w3.org
[mailto:public-esw-thes-request@w3.org]On Behalf Of Miles, AJ (Alistair)
Sent: 19 October 2005 12:14
To: Leonard Will; public-esw-thes@w3.org; Stella Dextre Clarke (E-mail);
Ron Davies (E-mail)
Subject: pre- and post- coordinate indexing



Hi Leonard,

> I'll not go into substantive discussion of this at the moment, as you
> suggest, but just note that I think you have a typo in it which may
> confuse people. In that document you say:
>
> >In a 'post-coordinate' concept scheme, concepts are meant to be
> >combined by the indexer into more meaningful units, at the time the
> >indexing is done.
>
> This is "pre-coordinate" indexing, not "post-coordinate". In
> "post-coordinate" systems the concepts are not combined (or
> coordinated)
> until the search stage, when they may be included in a Boolean search
> statement by the searcher.
>
> You can think of the pre- and post- prefixes as relating to
> the linking
> of concepts occurring before or after the indexed documents
> are stored
> and made available for use.

This wasn't a typo, I had completely misunderstood the meanings of pre- and
post- coordinate indexing.

I'd very much like to have this requirement met within SKOS Core, but I need
to understand the systems better, so I'd be very grateful if you or anyone
else could explain a couple of things for me ...

Could you explain how the indexing/search systems work under the two
scenarios (pre- and post- coordinate indexing)?  You mentioned an 'indexing
string' in another email, I'm assuming that this is a string of descriptors,
composed by the indexer, and then entered into a database field?  What do
indexing strings look like under the two senarios (i.e. what can and can't
you write)?  What do the search strings look like under the two scenarios
(i.e. what can and can't you write), and how is the search operation usually
implemented?

I'm a bit confused about a couple of things ...

Firstly, a thesaurus directive such as:

cut flower production USE cut flowers + crop production

... is that for the searcher or for the indexer?  Is there a fundamental
difference between thesauri intended for pre-coordinate use, and thesauri
intended for post-coordinate use?

Secondly, I'm *guessing* that under pre-coordinate indexing, an indexer
could make the following two types of indexing assignment (inventing my own
syntax):

doc | subject
----------------------------------
1   | cut flowers, crop production
2   | cut flowers + crop production

In the first assignment, the indexer wishes to state that the subjects of
document 1 are cut flowers, and crop production, although not necessarily
the production of cut flowers.  In the second assignment, the indexer
explicitly wishes to state that the subject of document 2 is (cut flowers +
crop production) i.e. cut flower production.

How does the searcher then distinguish between these two statements?  I'm
guessing that under traditional search systems, a boolean search string such
as 'cut flowers AND crop production' will not be able to distinguish between
the two statements (because it's implemented via some sort of sub-string
comparison), and will return both documents, is that correct?  Is this
something like the problem of 'false hits' that you mentioned previously
Leonard?  If not, can you describe the problem of 'false hits' that you
mentioned?

And finally, am I right to assume that under post-coordinate indexing, the
indexer does not have the ability to make the kind of distinction described
above?

Thanks alot for your time.

Al.
Received on Wednesday, 19 October 2005 13:03:35 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:38:54 GMT