pre- and post- coordinate indexing from Miles, AJ \(Alistair\) on 2005-10-19 (public-esw-thes@w3.org from October 2005)

From: Miles, AJ \(Alistair\) <A.J.Miles@rl.ac.uk>
Date: Wed, 19 Oct 2005 12:13:45 +0100
To: "Leonard Will" <L.Will@willpowerinfo.co.uk>, <public-esw-thes@w3.org>, "Stella Dextre Clarke \(E-mail\)" <SDClarke@lukehouse.demon.co.uk>, "Ron Davies \(E-mail\)" <ron@rondavies.be>
Message-ID: <677CE4DD24B12C4B9FA138534E29FB1D64C773@exchange11.fed.cclrc.ac.uk>

Hi Leonard,

> I'll not go into substantive discussion of this at the moment, as you 
> suggest, but just note that I think you have a typo in it which may 
> confuse people. In that document you say:
> 
> >In a 'post-coordinate' concept scheme, concepts are meant to be 
> >combined by the indexer into more meaningful units, at the time the 
> >indexing is done.
> 
> This is "pre-coordinate" indexing, not "post-coordinate". In 
> "post-coordinate" systems the concepts are not combined (or 
> coordinated) 
> until the search stage, when they may be included in a Boolean search 
> statement by the searcher.
> 
> You can think of the pre- and post- prefixes as relating to 
> the linking 
> of concepts occurring before or after the indexed documents 
> are stored 
> and made available for use.

This wasn't a typo, I had completely misunderstood the meanings of pre- and post- coordinate indexing.  

I'd very much like to have this requirement met within SKOS Core, but I need to understand the systems better, so I'd be very grateful if you or anyone else could explain a couple of things for me ...

Could you explain how the indexing/search systems work under the two scenarios (pre- and post- coordinate indexing)?  You mentioned an 'indexing string' in another email, I'm assuming that this is a string of descriptors, composed by the indexer, and then entered into a database field?  What do indexing strings look like under the two senarios (i.e. what can and can't you write)?  What do the search strings look like under the two scenarios (i.e. what can and can't you write), and how is the search operation usually implemented?

I'm a bit confused about a couple of things ...

Firstly, a thesaurus directive such as:

cut flower production USE cut flowers + crop production

... is that for the searcher or for the indexer?  Is there a fundamental difference between thesauri intended for pre-coordinate use, and thesauri intended for post-coordinate use?

Secondly, I'm *guessing* that under pre-coordinate indexing, an indexer could make the following two types of indexing assignment (inventing my own syntax):

doc | subject
----------------------------------
1   | cut flowers, crop production
2   | cut flowers + crop production

In the first assignment, the indexer wishes to state that the subjects of document 1 are cut flowers, and crop production, although not necessarily the production of cut flowers.  In the second assignment, the indexer explicitly wishes to state that the subject of document 2 is (cut flowers + crop production) i.e. cut flower production.

How does the searcher then distinguish between these two statements?  I'm guessing that under traditional search systems, a boolean search string such as 'cut flowers AND crop production' will not be able to distinguish between the two statements (because it's implemented via some sort of sub-string comparison), and will return both documents, is that correct?  Is this something like the problem of 'false hits' that you mentioned previously Leonard?  If not, can you describe the problem of 'false hits' that you mentioned?

And finally, am I right to assume that under post-coordinate indexing, the indexer does not have the ability to make the kind of distinction described above?

Thanks alot for your time.

Al.

Received on Wednesday, 19 October 2005 11:13:50 UTC