Re: notes at contepts vs notes at terms from Mark van Assem on 2005-10-26 (public-esw-thes@w3.org from October 2005)

From: Mark van Assem <mark@cs.vu.nl>
Date: Wed, 26 Oct 2005 13:00:57 +0200
To: "Miles, AJ \(Alistair\)" <A.J.Miles@rl.ac.uk>
CC: public-esw-thes@w3.org
Message-ID: <435F61E9.90707@cs.vu.nl>
Hi Alistair,

> I don't know how to say this without sounding like an arse ... but I'm pretty sure that what you're suggesting contradicts the basic principles of thesaurus construction and use, as I've learned them from ISO 2788, the new BS 8723, and directly from folks like Stella and Leonard.

Probably you're right, but I think that some of the thesaurus folk are 
in favour of having a Term class for the reason of attaching 
properties to them. The result is that you can have URIs for them, and 
use the terms in the ways I suggest. And I guess that if people find 
those useful, they *will*, no matter what any standard is saying. And 
I don't think they would be wrong in doing so.

> ... then thesaurus T term <rock> and thesaurus T term <basalt> are semantically equivalent tokens.

Yep, in the thesaurus they are, just like (I think) in WN the 
WordSenses are equivalent within one Synset. But for some practical 
uses (which you agreed to exist for WordSenses) they are not.

> Therefore, 'annotating' a document with the thesaurus T term <basalt> is semantically equivalent to 'annotating' the document with the thesarus T term <rock>.  Therefore, there's no point in doing it.

Would someone using that thesaurus agree that <basalt> and <rock> are 
equivalent?

> If you want to say something more specific, using a thesaurus, then you need a thesaurus that has <basalt> as a preferred term.

But if there isn't any?

> Alternatively, use free text keyword annotations.

Note that I'm referring to use cases other than annotation for 
document retrieval, for which I agree you should annotate with the 
concept, not the term.

> The words 'rock' and 'basalt' may have quite different meanings to you when used in natural language discourse, but that is completely irrelevant.  The word 'rock', and thesarus T term <rock>, are entirely separate entities.
> 
> 
>>A more probable/useful scenario is that a prefterm in one 
>>language is mapped to
>>a nonpref term in another, because it is a more accurate 
>>translation of the
>>word. It enables a more finegrained mapping than just between 
>>concepts.
> 
> 
> If you are talking about semantic mapping, then whether you choose thesaurus T term <rock> or thesaurus T term <basalt> as your mapping target makes no difference to the meaning of the mapping, because thesaurus T term <rock> and thesaurus T term <basalt> are semantically equivalent tokens.  Therefore, if you are talking about semantic mapping, it is not possible to create a 'more fine-grained mapping' than that which is possible by mapping between the concepts.

Not on the concept level, but it is possible on the term level?

What is wrong with stating that prefTerm A in language X is usually 
displayed/used in texts/... in language Y with nonPrefTerm B? It gives 
you additional information that you are free to ignore, because the 
concept-to-concept mappings are implied by term-to-term mappings 
(well, if you define your mapping vocabulary in that way). It may help 
e.g. in translation or displays.

Maybe this is not extremely useful, but I don't see anything 
fundamentally wrong with it, either.

>>A first use is if you are really interested in that specific 
>>term instead of its
>>synonyms. For example if you want to count the number of 
>>times a certain concept
>>is misspelled. Or counting the # occurences of a specific term.
> 
> 
> How can you misspell a 'concept'?  What are you counting exactly?  What do you mean by an 'occurrence of a specific term'?

A concept cannot be misspelled because it is nameless. You are 
counting the terms, not the concept.

> N.B. A word, or collocations of words, that appears in a natural language document, and a thesaurus term that shares an identical character sequence, are entirely separate entities.  The fact that they share an identical character sequence allows you to infer absolutely nothing at all.

Why not? Of course you may need to assume that the meaning of term and 
word overlap, but I think that programmers might just do that.

> Am I making any sense?

I can see perfectly clear where you're coming from, and my use cases 
may turn out to be complete DB after all, but I do think that people 
would try to (ab)use a thesaurus in all kinds of ways, and would not 
be wrong in doing so. These are just additional arguments on top of 
the "we need a Term class to attach properties to" argument (which is 
probably a more compelling argument). And, if we do introduce a Term 
class, they are possible uses which we cannot prohibit.

Cheers,
Mark.

-- 
  Mark F.J. van Assem - Vrije Universiteit Amsterdam
        mark@cs.vu.nl - http://www.cs.vu.nl/~mark
Received on Wednesday, 26 October 2005 11:01:21 UTC