W3C home > Mailing lists > Public > public-esw-thes@w3.org > October 2005

RE: pre- and post- coordinate indexing

From: Miles, AJ \(Alistair\) <A.J.Miles@rl.ac.uk>
Date: Wed, 19 Oct 2005 15:06:26 +0100
Message-ID: <677CE4DD24B12C4B9FA138534E29FB1D0ACE04@exchange11.fed.cclrc.ac.uk>
To: "aida" <aida@acorweb.net>, <public-esw-thes@w3.org>

Hi Aida,

Thanks for this.  

> -----Original Message-----
> From: public-esw-thes-request@w3.org
> [mailto:public-esw-thes-request@w3.org]On Behalf Of aida
> Sent: 19 October 2005 14:03
> To: public-esw-thes@w3.org
> Subject: RE: pre- and post- coordinate indexing
> 
> 
> 
> Al,
>  it may help if you think of pre- an post- coordinated 
> systems as being
> related to the actual PROCESS of
> indexing. Thus
> - pre-coordinated systems (like subject heading systems and 
> classifications)
> combine terms in the process of indexing (metadata population). These
> systems have
> syntax rules to establish the exact order of terms. Different order of
> terms may imply different meaning (e.g. 'bibliography of 
> encyclopaedia'
> as opposed to 'encyclopaedia of bibliography')
> - post-coordinated systems (such as keywords, descriptor systems or
> thesauri)
> allow combination of terms only in the process of retrieval  
> (for instance
> no order/relation
> would be established between terms 'bibliography', 'encyclopaedia')
> 

Can you give me an example of how syntax rules for pre-coordination are expressed?

> Searching depends on how a retrieval system is implemented.
> In theory, the first (pre-coordinated) system could be 
> searched both as a
> 'phrase' and
> using Booleans.
> In the second, post-coordinate, system - search precision
> will go only as far as Booleans
> 
> One should not confuse a compound index term 'cut flowers' (which is
> actually a single
> indexing term) with pre-coordinated system which relates 
> single indexing
> terms (simple or compound)
> into more complex syntactical expressions in which the order of terms
> determines the meaning
> 

This answers another question I had which is: does the order of coordination matter?

In the example I used 'cut flowers + crop production' as the 'coordinated' indexing term, used for the non-descriptor 'cut flower production', is this valid?

Cheers,

Al. 

> 
> -----Original Message-----
> From: public-esw-thes-request@w3.org
> [mailto:public-esw-thes-request@w3.org]On Behalf Of Miles, AJ 
> (Alistair)
> Sent: 19 October 2005 12:14
> To: Leonard Will; public-esw-thes@w3.org; Stella Dextre 
> Clarke (E-mail);
> Ron Davies (E-mail)
> Subject: pre- and post- coordinate indexing
> 
> 
> 
> Hi Leonard,
> 
> > I'll not go into substantive discussion of this at the 
> moment, as you
> > suggest, but just note that I think you have a typo in it which may
> > confuse people. In that document you say:
> >
> > >In a 'post-coordinate' concept scheme, concepts are meant to be
> > >combined by the indexer into more meaningful units, at the time the
> > >indexing is done.
> >
> > This is "pre-coordinate" indexing, not "post-coordinate". In
> > "post-coordinate" systems the concepts are not combined (or
> > coordinated)
> > until the search stage, when they may be included in a 
> Boolean search
> > statement by the searcher.
> >
> > You can think of the pre- and post- prefixes as relating to
> > the linking
> > of concepts occurring before or after the indexed documents
> > are stored
> > and made available for use.
> 
> This wasn't a typo, I had completely misunderstood the 
> meanings of pre- and
> post- coordinate indexing.
> 
> I'd very much like to have this requirement met within SKOS 
> Core, but I need
> to understand the systems better, so I'd be very grateful if 
> you or anyone
> else could explain a couple of things for me ...
> 
> Could you explain how the indexing/search systems work under the two
> scenarios (pre- and post- coordinate indexing)?  You 
> mentioned an 'indexing
> string' in another email, I'm assuming that this is a string 
> of descriptors,
> composed by the indexer, and then entered into a database 
> field?  What do
> indexing strings look like under the two senarios (i.e. what 
> can and can't
> you write)?  What do the search strings look like under the 
> two scenarios
> (i.e. what can and can't you write), and how is the search 
> operation usually
> implemented?
> 
> I'm a bit confused about a couple of things ...
> 
> Firstly, a thesaurus directive such as:
> 
> cut flower production USE cut flowers + crop production
> 
> ... is that for the searcher or for the indexer?  Is there a 
> fundamental
> difference between thesauri intended for pre-coordinate use, 
> and thesauri
> intended for post-coordinate use?
> 
> Secondly, I'm *guessing* that under pre-coordinate indexing, 
> an indexer
> could make the following two types of indexing assignment 
> (inventing my own
> syntax):
> 
> doc | subject
> ----------------------------------
> 1   | cut flowers, crop production
> 2   | cut flowers + crop production
> 
> In the first assignment, the indexer wishes to state that the 
> subjects of
> document 1 are cut flowers, and crop production, although not 
> necessarily
> the production of cut flowers.  In the second assignment, the indexer
> explicitly wishes to state that the subject of document 2 is 
> (cut flowers +
> crop production) i.e. cut flower production.
> 
> How does the searcher then distinguish between these two 
> statements?  I'm
> guessing that under traditional search systems, a boolean 
> search string such
> as 'cut flowers AND crop production' will not be able to 
> distinguish between
> the two statements (because it's implemented via some sort of 
> sub-string
> comparison), and will return both documents, is that correct?  Is this
> something like the problem of 'false hits' that you mentioned 
> previously
> Leonard?  If not, can you describe the problem of 'false 
> hits' that you
> mentioned?
> 
> And finally, am I right to assume that under post-coordinate 
> indexing, the
> indexer does not have the ability to make the kind of 
> distinction described
> above?
> 
> Thanks alot for your time.
> 
> Al.
> 
> 
> 
> 
> 
> 
Received on Wednesday, 19 October 2005 14:06:32 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:38:54 GMT