RE: candidate and deprecated concepts

From: Stella Dextre Clarke <sdclarke@lukehouse.demon.co.uk>
Date: Mon, 11 Oct 2004 10:45:42 +0100
To: "'Bernard Vatant'" <bernard.vatant@mondeca.com>, <public-esw-thes@w3.org>
Interesting discussion, and I wish I had time to follow up on the idea
of an indexing profile separate from a thesaurus. Could be very useful
in some interoperability scenarios. 

But for now I'll just comment that I had thought "deprecated term" was
an Americanism, synonymous with "non-preferred term", creeping across
the Atlantic. Seems you in Mondeca use it as an NT of "non-preferred
term". I wonder how everyone else uses it...?

By the way, you might be interested to know that BSI has just issued
DPCs (Drafts for Public Comment) of the first two parts of BS8723 -
something  which one day we hope will supersede BS5723 (=ISO 2788) and
BS6723 (=ISO5964). The documents have numbers 04/30086620 DC and
04/30094113 DC respectively, and are said to be available from
orders@bsi-global.com, probably only on payment. BSI has so far been
unable to give any more info about how to get them, certainly not about
how to download them from anywhere. But comments from people like you
could help the standards evolve to take account of new practices.


> The difference between a deprecated concept and a deprecated term may 
> not be as clear as you might wish.

Sure, no more than the difference between a term or a concept,
deprecated or not :)

> (And even the word "deprecated" is a
> bit strange to me in the context of thesauri. We usually just say
> non-preferred.)

Indeed? We currently in Mondeca work for a major actor in legal
publication (Wolters Kluwer Belgium), making a very intensive use of
Thesauri, including e.g. their use for automatic generation of
publication index. And one of the strong requirements of the folks in
charge of Thesaurus management was indeed a proper handling of what they
call "deprecated terms". A deprecated term is a term that used to be
preferred, and used as such, and at some point of time in the history of
the vocabulary was replaced by another preferred term. After
"deprecation" (so to speak) the once-preferred, and now deprecated term,
is kept as a synonym of the preferred term which replaces it. Whatever
relationships (BT-NT, USE, ...) of the deprecated term are re-located to
the replacing term, and indexation of documents is redirected.

So of course a deprecated term is non-preferred, but it used to be, and
the system keeps track of that if necessary.

> It is unusual to drop a concept altogether.

Of course the concept does not really change because the term is
deprecated, since it's replaced, it's simply the preferred term for it
that changes. Concepts never die :)

> Normally one provides a lead-in entry pointing to the broader concept 
> that covers the scope of the preferred term that is now to be 
> "deprecated". It is conceivable that if it was decided that a large 
> subject area with perhaps hundreds of concepts was now out-of-scope, 
> then all the corresponding terms might be dropped without trace ( 
> although this is not usually recommended). The thesaurus might well be

> renamed or rebranded to mark the transition.

This is another story ...

> Much more likely would be to decide that that subject area should be 
> indexed at a much shallower level of specificity.

I think Thesaurus structure can (should) be kept independent from the
indexing practices/applications that use the Thesaurus. See at the end
the general remak about declarative vs procedural properties. Several
different indexing applications can use the same Thesaurus at different
levels of granularity, use or not use specific branches etc ... This is
the notion of index profile (also a requirement of the above quoted
customer). The index profile can be managed independently of the
structure of the thesaurus itself. You can say e.g. in the profile that
you only use the three first levels of the Thesaurus hierarchy, so
whatever is indexed at a finer level of granularity will be re-indexed
by the relevant parent.

> So, for example, in a
> thesaurus for agricultural products, it might be decided that tropical

> products should no longer be covered in detail. Where previously you 
> had Bananas, Pineapples, Brazil nuts etc as preferred terms ( with a 
> hierarchy of BTs such as Tropical fruits all the way up to Tropical 
> products), you might leave just one term "Tropical products" to cover 
> all of these. In the thesaurus you would organise entries such as 
> "Bananas USE Tropical products" - perhaps hundreds of such entries. 
> Now where is the "deprecated concept"? All we have is one very broad 
> concept taking in tropical products at all levels of detail, and lots 
> of non-preferred terms.

This is quite different from deprecation, it's changing the granularity
of the Thesaurus. And in such a case, you could just change the indexing
profile, saying now that "Tropical Products" is a "leaf term" for the
indexing profile (meaning that everything below should be indexed on
that term).

> So the idea of a "deprecated concept" just feels a bit alien.

Yes, there again, concepts never die. This is an important rule I've
found out in topic map management : never delete a topic. Change its
status, attributes, names, relationships, date of validity, but never
delete. Once you have spoken about something at some point, this thing
exists forever, at least as a subject of conversation :))

> I don't warm, either, to the idea of a concept getting "replaced" by 
> another one, unless they are so close that you would treat the two as 
> quasi-synonymous. You are hardly going to replace Bananas with Washing

> machines?

There again, only terms are replaced, not concepts.

Bottom line : We need here to make distinct the *declarative* properties
of concepts, valid whatever the context of application and the
*procedural* properties, applicable only in specific contexts of use.
For example seems to me that the BT-NT relationship between "Tropical
Fruits" and "Bananas" should be declarative, and kept existing whatever
the context, whereas the USE-UF relationship stated in order to use the
Thesaurus at a broader level of granularity, is procedural: you know
pretty well that "Tropical Fruits" and "Bananas" are distinct concepts,
but in a certain context of application this distinction is useless for
whatever reason. It's different from, say, you had an ancient
astronomical thesaurus where "Evening Star" and "Morning Star" were
thought as distinct concepts, and you decide/discover at some point that
in fact they both have to be replaced by "Planet Venus". In this latter
case, there is actually a (declarative) change in the conceptual scheme.
It might be that the USE-UF relationship in Thesaurus is sometimes used
in a declarative sense, and sometimes in a procedural sense, leading to
some ambiguities. The above quoted notion of "index profile" allows to
capture the procedural properties for various contexts, while not
changing the declarative properties in the (common) Thesaurus.



Bernard Vatant
Senior Consultant
Knowledge Engineering
Mondeca - www.mondeca.com
