- From: Mark van Assem <mark@cs.vu.nl>
- Date: Wed, 26 Oct 2005 13:00:57 +0200
- To: "Miles, AJ \(Alistair\)" <A.J.Miles@rl.ac.uk>
- CC: public-esw-thes@w3.org
Hi Alistair, > I don't know how to say this without sounding like an arse ... but I'm pretty sure that what you're suggesting contradicts the basic principles of thesaurus construction and use, as I've learned them from ISO 2788, the new BS 8723, and directly from folks like Stella and Leonard. Probably you're right, but I think that some of the thesaurus folk are in favour of having a Term class for the reason of attaching properties to them. The result is that you can have URIs for them, and use the terms in the ways I suggest. And I guess that if people find those useful, they *will*, no matter what any standard is saying. And I don't think they would be wrong in doing so. > ... then thesaurus T term <rock> and thesaurus T term <basalt> are semantically equivalent tokens. Yep, in the thesaurus they are, just like (I think) in WN the WordSenses are equivalent within one Synset. But for some practical uses (which you agreed to exist for WordSenses) they are not. > Therefore, 'annotating' a document with the thesaurus T term <basalt> is semantically equivalent to 'annotating' the document with the thesarus T term <rock>. Therefore, there's no point in doing it. Would someone using that thesaurus agree that <basalt> and <rock> are equivalent? > If you want to say something more specific, using a thesaurus, then you need a thesaurus that has <basalt> as a preferred term. But if there isn't any? > Alternatively, use free text keyword annotations. Note that I'm referring to use cases other than annotation for document retrieval, for which I agree you should annotate with the concept, not the term. > The words 'rock' and 'basalt' may have quite different meanings to you when used in natural language discourse, but that is completely irrelevant. The word 'rock', and thesarus T term <rock>, are entirely separate entities. > > >>A more probable/useful scenario is that a prefterm in one >>language is mapped to >>a nonpref term in another, because it is a more accurate >>translation of the >>word. It enables a more finegrained mapping than just between >>concepts. > > > If you are talking about semantic mapping, then whether you choose thesaurus T term <rock> or thesaurus T term <basalt> as your mapping target makes no difference to the meaning of the mapping, because thesaurus T term <rock> and thesaurus T term <basalt> are semantically equivalent tokens. Therefore, if you are talking about semantic mapping, it is not possible to create a 'more fine-grained mapping' than that which is possible by mapping between the concepts. Not on the concept level, but it is possible on the term level? What is wrong with stating that prefTerm A in language X is usually displayed/used in texts/... in language Y with nonPrefTerm B? It gives you additional information that you are free to ignore, because the concept-to-concept mappings are implied by term-to-term mappings (well, if you define your mapping vocabulary in that way). It may help e.g. in translation or displays. Maybe this is not extremely useful, but I don't see anything fundamentally wrong with it, either. >>A first use is if you are really interested in that specific >>term instead of its >>synonyms. For example if you want to count the number of >>times a certain concept >>is misspelled. Or counting the # occurences of a specific term. > > > How can you misspell a 'concept'? What are you counting exactly? What do you mean by an 'occurrence of a specific term'? A concept cannot be misspelled because it is nameless. You are counting the terms, not the concept. > N.B. A word, or collocations of words, that appears in a natural language document, and a thesaurus term that shares an identical character sequence, are entirely separate entities. The fact that they share an identical character sequence allows you to infer absolutely nothing at all. Why not? Of course you may need to assume that the meaning of term and word overlap, but I think that programmers might just do that. > Am I making any sense? I can see perfectly clear where you're coming from, and my use cases may turn out to be complete DB after all, but I do think that people would try to (ab)use a thesaurus in all kinds of ways, and would not be wrong in doing so. These are just additional arguments on top of the "we need a Term class to attach properties to" argument (which is probably a more compelling argument). And, if we do introduce a Term class, they are possible uses which we cannot prohibit. Cheers, Mark. -- Mark F.J. van Assem - Vrije Universiteit Amsterdam mark@cs.vu.nl - http://www.cs.vu.nl/~mark
Received on Wednesday, 26 October 2005 11:01:21 UTC