- From: Stella Dextre Clarke <sdclarke@lukehouse.demon.co.uk>
- Date: Tue, 1 Nov 2005 18:49:09 -0000
- To: "'Sue Ellen Wright'" <sellenwright@gmail.com>, "'Miles, AJ \(Alistair\)'" <A.J.Miles@rl.ac.uk>
- Cc: "'Mark van Assem'" <mark@cs.vu.nl>, <public-esw-thes@w3.org>
- Message-ID: <005f01c5df14$f1a04680$0300a8c0@DELL>
This time I don't see this quite the same way as Sue, or Alistair for that matter. I agree that the term "term" may be used in a lot of different contexts; I agree this can cause confusion in communications. But I don't believe you can get rid of "term". Terms do happen to exist, they are very important in thesauri, and we have to deal with them, whether we like the name or not. If necessary, we could call them "thesaurus terms", but we cannot pretend they are not there, and we *do* need to be able to refer to them without calling them "concepts" (because they are *not* concepts - they only represent concepts.) Moving on from that, I must try to fulfil my promise to provide examples of when thesaurus editors may like to attach notes to terms *not* concepts. You may find the examples more convincing if you imagine them all being applied to non-preferred terms: 1. History notes. For example, a non-preferred term "Beagles" might need the following history note: 'Previously a non-preferred term of "Dogs"; became a non-preferred term of "Hounds" when the latter was introduced as a preferred term in 2003.' As it happens I have never myself used this type of note, and we have not provided for it (yet) in BS8723. But I have been sorely tempted on several occasions during a recent project. Of course, it is possible to attach the information to the concept History Note(s) - in this case you'd need to say something in the HNs of both "Dogs" and "Hounds" - but it gets cumbersome. 2. Editorial Notes. Example A: "Term proposed for upgrading to preferred status on 2004-10-01. Proposal rejected on grounds of ..... File reference XYZ-123" Example B: "Term requested by Bloggins on 2002-03-03" Example C: "Term source: ABC Thesaurus" In a recent project I have been merging three vocabularies into one, and there are vested interests behind the retention of some terms that might otherwise have been dropped. Sometimes it is useful to keep an audit trail of exactly where the term came from, who wants it, why they want it, and what arguments have already been had about it. Some of the arguments may be about the underlying concept; but sometimes they are really focussed on a particular term. 3. Definitions. Sometimes it is useful to retain definitions of terms gleaned from various sources - even when several definitions for the same term conflict with each other. They do *not* constitute definitions of the concept that is wanted for retrieval purposes. But they may come in handy when thesaurus changes are proposed, or for associated scholarly work. To see examples, look at the AAT (http://www.getty.edu/research/conducting_research/vocabularies/aat/inde x.html). Look at the record for any preferred term - take "drug jars" for example. Last time I looked, 14 different non-preferred terms were listed, and for each of these there was a reference to the sources where it was found e.g. Webster's Dictionary, the OED, Spillman's "Glass Bottles", etc. Not everyone can afford to do scholarly work on this scale, and you could say the AAT is an example in a class of its own. But work like this does happen, you do find it in real live thesauri, and people do want to exchange such data. 4. Mappings I've heard some people say they want to be able to map to/from non-preferred terms (separately from the mappings between their corresponding preferred terms). I've yet to be convinced of this in a real case, but some people do believe in it strongly. OK, I hope that's enough examples. I agree with the argument that a capability for having notes on terms is not nearly such a high priority as that for notes on concepts. But the need occurs commonly enough to make a case for accommodating it in a model that aims to be comprehensive. Perhaps it could be in a model for more advanced users, so as not to create unnecessary difficulties for users with simpler needs? Then there's a parallel argument, the one Ron raised about relationships between non-preferred terms in different languages of one multilingual thesauri. He and I have discussed this before, and he knows I'm not keen on this practice. (It has a lot in common with the case of mappings, mentioned above.) But he is right to say that a number of well-known multilingual thesauri do follow this practice. If you want to keep their editors on side, you have to provide for their needs. Plenty to keep us all busy thinking.... Stella ***************************************************** Stella Dextre Clarke Information Consultant Luke House, West Hendred, Wantage, Oxon, OX12 8RR, UK Tel: 01235-833-298 Fax: 01235-863-298 SDClarke@LukeHouse.demon.co.uk ***************************************************** -----Original Message----- From: public-esw-thes-request@w3.org [mailto:public-esw-thes-request@w3.org] On Behalf Of Sue Ellen Wright Sent: 01 November 2005 15:13 To: Miles, AJ (Alistair) Cc: Mark van Assem; public-esw-thes@w3.org Subject: Re: notes at contepts vs notes at terms I do agree with the rant on the word "term". That doesn't mean that there should be a note related to whatever you choose to use instead (lable?). But the word "term" is very problematic because each community of practice uses it in a different way. Sue Ellen On 10/26/05, Miles, AJ (Alistair) <A.J.Miles@rl.ac.uk> wrote: Hi Mark, > Note that I'm referring to use cases other than annotation for > document retrieval, for which I agree you should annotate with the > concept, not the term. Can you please describe these use cases in detail, explaining in each case exactly what it is you want to be able to assert, what those assertions would mean, and what exactly is the nature of the resources involved in those assertions. > These are just additional arguments on top of > the "we need a Term class to attach properties to" argument What are these properties? Please list, with an explanation of the meaning of any assertions made using them. Fwiw ... 'Term' is the most hideous word. It means a million different things to a million different people. A 'term' from a controlled vocabulary, and a 'term' from a terminology are *completely different things* [1][2]. In metadata applications, 'terms' can be properties of things, or values of those properties, or classes of things, or meaningless strings, or all of the above - cf. the 'Dublin Core Metadata Terms' [3]. The SKOS Core Vocabulary Specification [4] uses 'term' to refer to the classes and properties of the SKOS Core Vocabulary itself, a usage that is consistent with Dublin Core and other RDF documentation. Because of this incredibly overloaded usage in overlapping fields of discourse, the SKOS Core Guide [5] contains virtually no occurrences of the character string 'term' in prose. This is *very* deliberate. (I just found a couple that slipped through, doh.) The lesson Dublin Core folks have learned is: be precise. The meaning of several of the properties of the dublin core element set is now so overloaded in practice as to render them effectively meaningless. This is a huge problem for the DCMI architecture and usage teams. If we were to coin a class 'Term' for SKOS Core, I'm quite certain that the incredible variation that would be found in its practical usage would render it, and all the associated parts of SKOS Core, effectively meaningless. We would be contributing confusion to an already very confused field of discourse. Bottom line: If you can define a class of resources that isn't called 'Term', whose meaning is clear and easily defined, whose application is straightforward and unambiguous, and whose supporting use cases can be justified by a significant body of practice, then great, let's talk about it. If you can't, think outside the box. Think about n-ary relations. If you're finding it hard to define the nature (i.e. type) of the things you're trying to relate, perhaps you're conflating resources. Perhaps what you understand as a 'thesaurus term' is actually an instance of an n-ary relationship between several things. If you don't like n-ary relations, make an effort to differentiate what you mean by the word 'term' in all the different contexts in which you use it, then start defining classes from there. I'll bet you end up with about 12 classes, almost all of which are disjoint. Cheers, Al. [1] http://lists.w3.org/Archives/Public/public-esw-thes/2005Oct/0114.html [2] http://lists.w3.org/Archives/Public/public-esw-thes/2005Oct/0085.html [3] http://dublincore.org/documents/dcmi-terms/ [4] http://www.w3.org/TR/2005/WD-swbp-skos-core-spec-20050510/ [5] http://www.w3.org/TR/2005/WD-swbp-skos-core-guide-20050510/ > -----Original Message----- > From: Mark van Assem [mailto:mark@cs.vu.nl] > Sent: 26 October 2005 12:01 > To: Miles, AJ (Alistair) > Cc: public-esw-thes@w3.org <mailto:public-esw-thes@w3.org> > Subject: Re: notes at contepts vs notes at terms > > > Hi Alistair, > > > I don't know how to say this without sounding like an arse > ... but I'm pretty sure that what you're suggesting > contradicts the basic principles of thesaurus construction > and use, as I've learned them from ISO 2788, the new BS 8723, > and directly from folks like Stella and Leonard. > > Probably you're right, but I think that some of the thesaurus > folk are > in favour of having a Term class for the reason of attaching > properties to them. The result is that you can have URIs for > them, and > use the terms in the ways I suggest. And I guess that if people find > those useful, they *will*, no matter what any standard is saying. And > I don't think they would be wrong in doing so. > > > ... then thesaurus T term <rock> and thesaurus T term > <basalt> are semantically equivalent tokens. > > Yep, in the thesaurus they are, just like (I think) in WN the > WordSenses are equivalent within one Synset. But for some practical > uses (which you agreed to exist for WordSenses) they are not. > > > Therefore, 'annotating' a document with the thesaurus T > term <basalt> is semantically equivalent to 'annotating' the > document with the thesarus T term <rock>. Therefore, there's > no point in doing it. > > Would someone using that thesaurus agree that <basalt> and <rock> are > equivalent? > > > If you want to say something more specific, using a > thesaurus, then you need a thesaurus that has <basalt> as a > preferred term. > > But if there isn't any? > > > Alternatively, use free text keyword annotations. > > Note that I'm referring to use cases other than annotation for > document retrieval, for which I agree you should annotate with the > concept, not the term. > > > The words 'rock' and 'basalt' may have quite different > meanings to you when used in natural language discourse, but > that is completely irrelevant. The word 'rock', and thesarus > T term <rock>, are entirely separate entities. > > > > > >>A more probable/useful scenario is that a prefterm in one > >>language is mapped to > >>a nonpref term in another, because it is a more accurate > >>translation of the > >>word. It enables a more finegrained mapping than just between > >>concepts. > > > > > > If you are talking about semantic mapping, then whether you > choose thesaurus T term <rock> or thesaurus T term <basalt> > as your mapping target makes no difference to the meaning of > the mapping, because thesaurus T term <rock> and thesaurus T > term <basalt> are semantically equivalent tokens. Therefore, > if you are talking about semantic mapping, it is not possible > to create a 'more fine-grained mapping' than that which is > possible by mapping between the concepts. > > Not on the concept level, but it is possible on the term level? > > What is wrong with stating that prefTerm A in language X is usually > displayed/used in texts/... in language Y with nonPrefTerm B? > It gives > you additional information that you are free to ignore, because the > concept-to-concept mappings are implied by term-to-term mappings > (well, if you define your mapping vocabulary in that way). It > may help > e.g. in translation or displays. > > Maybe this is not extremely useful, but I don't see anything > fundamentally wrong with it, either. > > >>A first use is if you are really interested in that specific > >>term instead of its > >>synonyms. For example if you want to count the number of > >>times a certain concept > >>is misspelled. Or counting the # occurences of a specific term. > > > > > > How can you misspell a 'concept'? What are you counting > exactly? What do you mean by an 'occurrence of a specific term'? > > A concept cannot be misspelled because it is nameless. You are > counting the terms, not the concept. > > > N.B. A word, or collocations of words, that appears in a > natural language document, and a thesaurus term that shares > an identical character sequence, are entirely separate > entities. The fact that they share an identical character > sequence allows you to infer absolutely nothing at all. > > Why not? Of course you may need to assume that the meaning of > term and > word overlap, but I think that programmers might just do that. > > > Am I making any sense? > > I can see perfectly clear where you're coming from, and my use cases > may turn out to be complete DB after all, but I do think that people > would try to (ab)use a thesaurus in all kinds of ways, and would not > be wrong in doing so. These are just additional arguments on top of > the "we need a Term class to attach properties to" argument (which is > probably a more compelling argument). And, if we do introduce a Term > class, they are possible uses which we cannot prohibit. > > Cheers, > Mark. > > -- > Mark F.J. van Assem - Vrije Universiteit Amsterdam > mark@cs.vu.nl - http://www.cs.vu.nl/~mark > -- Sue Ellen Wright Institute for Applied Linguistics Kent State University Kent OH 44242 USA sellenwright@gmail.com swright@kent.edu sewright@neo.rr.com
Received on Tuesday, 1 November 2005 18:49:18 UTC