- From: aida <aida@acorweb.net>
- Date: Wed, 19 Oct 2005 14:03:24 +0100
- To: <public-esw-thes@w3.org>
Al, it may help if you think of pre- an post- coordinated systems as being related to the actual PROCESS of indexing. Thus - pre-coordinated systems (like subject heading systems and classifications) combine terms in the process of indexing (metadata population). These systems have syntax rules to establish the exact order of terms. Different order of terms may imply different meaning (e.g. 'bibliography of encyclopaedia' as opposed to 'encyclopaedia of bibliography') - post-coordinated systems (such as keywords, descriptor systems or thesauri) allow combination of terms only in the process of retrieval (for instance no order/relation would be established between terms 'bibliography', 'encyclopaedia') Searching depends on how a retrieval system is implemented. In theory, the first (pre-coordinated) system could be searched both as a 'phrase' and using Booleans. In the second, post-coordinate, system - search precision will go only as far as Booleans One should not confuse a compound index term 'cut flowers' (which is actually a single indexing term) with pre-coordinated system which relates single indexing terms (simple or compound) into more complex syntactical expressions in which the order of terms determines the meaning Leonard is probably going to explain this a bit better but this may help for now aida -----Original Message----- From: public-esw-thes-request@w3.org [mailto:public-esw-thes-request@w3.org]On Behalf Of Miles, AJ (Alistair) Sent: 19 October 2005 12:14 To: Leonard Will; public-esw-thes@w3.org; Stella Dextre Clarke (E-mail); Ron Davies (E-mail) Subject: pre- and post- coordinate indexing Hi Leonard, > I'll not go into substantive discussion of this at the moment, as you > suggest, but just note that I think you have a typo in it which may > confuse people. In that document you say: > > >In a 'post-coordinate' concept scheme, concepts are meant to be > >combined by the indexer into more meaningful units, at the time the > >indexing is done. > > This is "pre-coordinate" indexing, not "post-coordinate". In > "post-coordinate" systems the concepts are not combined (or > coordinated) > until the search stage, when they may be included in a Boolean search > statement by the searcher. > > You can think of the pre- and post- prefixes as relating to > the linking > of concepts occurring before or after the indexed documents > are stored > and made available for use. This wasn't a typo, I had completely misunderstood the meanings of pre- and post- coordinate indexing. I'd very much like to have this requirement met within SKOS Core, but I need to understand the systems better, so I'd be very grateful if you or anyone else could explain a couple of things for me ... Could you explain how the indexing/search systems work under the two scenarios (pre- and post- coordinate indexing)? You mentioned an 'indexing string' in another email, I'm assuming that this is a string of descriptors, composed by the indexer, and then entered into a database field? What do indexing strings look like under the two senarios (i.e. what can and can't you write)? What do the search strings look like under the two scenarios (i.e. what can and can't you write), and how is the search operation usually implemented? I'm a bit confused about a couple of things ... Firstly, a thesaurus directive such as: cut flower production USE cut flowers + crop production ... is that for the searcher or for the indexer? Is there a fundamental difference between thesauri intended for pre-coordinate use, and thesauri intended for post-coordinate use? Secondly, I'm *guessing* that under pre-coordinate indexing, an indexer could make the following two types of indexing assignment (inventing my own syntax): doc | subject ---------------------------------- 1 | cut flowers, crop production 2 | cut flowers + crop production In the first assignment, the indexer wishes to state that the subjects of document 1 are cut flowers, and crop production, although not necessarily the production of cut flowers. In the second assignment, the indexer explicitly wishes to state that the subject of document 2 is (cut flowers + crop production) i.e. cut flower production. How does the searcher then distinguish between these two statements? I'm guessing that under traditional search systems, a boolean search string such as 'cut flowers AND crop production' will not be able to distinguish between the two statements (because it's implemented via some sort of sub-string comparison), and will return both documents, is that correct? Is this something like the problem of 'false hits' that you mentioned previously Leonard? If not, can you describe the problem of 'false hits' that you mentioned? And finally, am I right to assume that under post-coordinate indexing, the indexer does not have the ability to make the kind of distinction described above? Thanks alot for your time. Al.
Received on Wednesday, 19 October 2005 13:03:35 UTC