W3C home > Mailing lists > Public > public-esw-thes@w3.org > November 2005

RE: notes at contepts vs notes at terms

From: Stella Dextre Clarke <sdclarke@lukehouse.demon.co.uk>
Date: Tue, 1 Nov 2005 18:49:09 -0000
To: "'Sue Ellen Wright'" <sellenwright@gmail.com>, "'Miles, AJ \(Alistair\)'" <A.J.Miles@rl.ac.uk>
Cc: "'Mark van Assem'" <mark@cs.vu.nl>, <public-esw-thes@w3.org>
Message-ID: <005f01c5df14$f1a04680$0300a8c0@DELL>
This time I don't see this quite the same way as Sue, or Alistair for
that matter. I agree that the term "term" may be used in a lot of
different contexts; I agree this can cause confusion in communications.
But I don't believe you can get rid of "term". Terms do happen to exist,
they are very important in thesauri, and we have to deal with them,
whether we like the name or not. If necessary, we could call them
"thesaurus terms", but we cannot pretend they are not there, and we *do*
need to be able to refer to them without calling them "concepts"
(because they are *not* concepts - they only represent concepts.)
 
Moving on from that, I must try to fulfil my promise to provide examples
of when thesaurus editors may like to attach notes to terms *not*
concepts. You may find the examples more convincing if you imagine them
all being applied to non-preferred terms:
 
1. History notes. 
For example, a non-preferred term "Beagles" might need the following
history note: 'Previously a non-preferred term of "Dogs"; became a
non-preferred term of "Hounds" when the latter was introduced as a
preferred term in 2003.'
 
As it happens I have never myself used this type of note, and we have
not provided for it (yet) in BS8723. But I have been sorely tempted on
several occasions during a recent project. Of course, it is possible to
attach the information to the concept History Note(s) - in this case
you'd need to say something in the HNs of both "Dogs" and "Hounds" - but
it gets cumbersome.
 
2. Editorial Notes.
Example A: "Term proposed for upgrading to preferred status on
2004-10-01. Proposal rejected on grounds of ..... File reference
XYZ-123"
 
Example B: "Term requested by Bloggins on 2002-03-03"
 
Example C: "Term source: ABC Thesaurus"
 
In a recent project I have been merging three vocabularies into one, and
there are vested interests behind the retention of some terms that might
otherwise have been dropped. Sometimes it is useful to keep an audit
trail of exactly where the term came from, who wants it, why they want
it, and what arguments have already been had about it. Some of the
arguments may be about the underlying concept; but sometimes they are
really focussed on a particular term.
 
3. Definitions.
Sometimes it is useful to retain definitions of terms gleaned from
various sources - even when several definitions for the same term
conflict with each other. They do *not* constitute definitions of the
concept that is wanted for retrieval purposes. But they may come in
handy when thesaurus changes are proposed, or for associated scholarly
work. To see examples, look at the AAT
(http://www.getty.edu/research/conducting_research/vocabularies/aat/inde
x.html). Look at the record for any preferred term - take "drug jars"
for example. Last time I looked, 14 different non-preferred terms were
listed, and for each of these there was a reference to the sources where
it was found e.g. Webster's Dictionary, the OED, Spillman's "Glass
Bottles", etc. Not everyone can afford to do scholarly work on this
scale, and you could say the AAT is an example in a class of its own.
But work like this does happen, you do find it in real live thesauri,
and people do want to exchange such data.
 
4. Mappings
I've heard some people say they want to be able to map to/from
non-preferred terms (separately from the mappings between their
corresponding preferred terms). I've yet to be convinced of this in a
real case, but some people do believe in it strongly.
 
OK, I hope that's enough examples. I agree with the argument that a
capability for having notes on terms is not nearly such a high priority
as that for notes on concepts. But the need occurs commonly enough to
make a case for  accommodating it in a model that aims to be
comprehensive. Perhaps it could be in a model for more advanced users,
so as not to create unnecessary difficulties for users with simpler
needs?
 
Then there's a parallel argument, the one Ron raised about relationships
between non-preferred terms in different languages of one multilingual
thesauri. He and I have discussed this before, and he knows I'm not keen
on this practice. (It has a lot in common with the case of mappings,
mentioned above.) But he is right to say that a number of well-known
multilingual thesauri do follow this practice. If you want to keep their
editors on side, you have to provide for their needs.
 
Plenty to keep us all busy thinking....
Stella

*****************************************************
Stella Dextre Clarke
Information Consultant
Luke House, West Hendred, Wantage, Oxon, OX12 8RR, UK
Tel: 01235-833-298
Fax: 01235-863-298
SDClarke@LukeHouse.demon.co.uk
*****************************************************



-----Original Message-----
From: public-esw-thes-request@w3.org
[mailto:public-esw-thes-request@w3.org] On Behalf Of Sue Ellen Wright
Sent: 01 November 2005 15:13
To: Miles, AJ (Alistair)
Cc: Mark van Assem; public-esw-thes@w3.org
Subject: Re: notes at contepts vs notes at terms


I do agree with the rant on the word "term". That doesn't mean that
there should be a note related to whatever you choose to use instead
(lable?). But the word "term" is very problematic because each community
of practice uses it in a different way. 
 
Sue Ellen

 
On 10/26/05, Miles, AJ (Alistair) <A.J.Miles@rl.ac.uk> wrote: 


Hi Mark,

> Note that I'm referring to use cases other than annotation for
> document retrieval, for which I agree you should annotate with the 
> concept, not the term.

Can you please describe these use cases in detail, explaining in each
case exactly what it is you want to be able to assert, what those
assertions would mean, and what exactly is the nature of the resources
involved in those assertions. 

> These are just additional arguments on top of
> the "we need a Term class to attach properties to" argument

What are these properties?  Please list, with an explanation of the
meaning of any assertions made using them. 

Fwiw ...

'Term' is the most hideous word.  It means a million different things to
a million different people.  A 'term' from a controlled vocabulary, and
a 'term' from a terminology are *completely different things* [1][2].
In metadata applications, 'terms' can be properties of things, or values
of those properties, or classes of things, or meaningless strings, or
all of the above - cf. the 'Dublin Core Metadata Terms' [3]. The SKOS
Core Vocabulary Specification [4] uses 'term' to refer to the classes
and properties of the SKOS Core Vocabulary itself, a usage that is
consistent with Dublin Core and other RDF documentation. 

Because of this incredibly overloaded usage in overlapping fields of
discourse, the SKOS Core Guide [5] contains virtually no occurrences of
the character string 'term' in prose.  This is *very* deliberate.  (I
just found a couple that slipped through, doh.) 

The lesson Dublin Core folks have learned is: be precise.  The meaning
of several of the properties of the dublin core element set is now so
overloaded in practice as to render them effectively meaningless.  This
is a huge problem for the DCMI architecture and usage teams. 

If we were to coin a class 'Term' for SKOS Core, I'm quite certain that
the incredible variation that would be found in its practical usage
would render it, and all the associated parts of SKOS Core, effectively
meaningless.  We would be contributing confusion to an already very
confused field of discourse. 

Bottom line: If you can define a class of resources that isn't called
'Term', whose meaning is clear and easily defined, whose application is
straightforward and unambiguous, and whose supporting use cases can be
justified by a significant body of practice, then great, let's talk
about it. 

If you can't, think outside the box.  Think about n-ary relations.  If
you're finding it hard to define the nature (i.e. type) of the things
you're trying to relate, perhaps you're conflating resources.  Perhaps
what you understand as a 'thesaurus term' is actually an instance of an
n-ary relationship between several things.  If you don't like n-ary
relations, make an effort to differentiate what you mean by the word
'term' in all the different contexts in which you use it, then start
defining classes from there.  I'll bet you end up with about 12 classes,
almost all of which are disjoint. 

Cheers,

Al.



[1]
http://lists.w3.org/Archives/Public/public-esw-thes/2005Oct/0114.html
[2]
http://lists.w3.org/Archives/Public/public-esw-thes/2005Oct/0085.html
[3] http://dublincore.org/documents/dcmi-terms/
[4] http://www.w3.org/TR/2005/WD-swbp-skos-core-spec-20050510/
[5] http://www.w3.org/TR/2005/WD-swbp-skos-core-guide-20050510/



> -----Original Message----- 
> From: Mark van Assem [mailto:mark@cs.vu.nl]
> Sent: 26 October 2005 12:01
> To: Miles, AJ (Alistair)
> Cc: public-esw-thes@w3.org  <mailto:public-esw-thes@w3.org> 
> Subject: Re: notes at contepts vs notes at terms
>
>
> Hi Alistair,
>
> > I don't know how to say this without sounding like an arse
> ... but I'm pretty sure that what you're suggesting 
> contradicts the basic principles of thesaurus construction
> and use, as I've learned them from ISO 2788, the new BS 8723,
> and directly from folks like Stella and Leonard.
>
> Probably you're right, but I think that some of the thesaurus 
> folk are
> in favour of having a Term class for the reason of attaching
> properties to them. The result is that you can have URIs for
> them, and
> use the terms in the ways I suggest. And I guess that if people find 
> those useful, they *will*, no matter what any standard is saying. And
> I don't think they would be wrong in doing so.
>
> > ... then thesaurus T term <rock> and thesaurus T term
> <basalt> are semantically equivalent tokens. 
>
> Yep, in the thesaurus they are, just like (I think) in WN the
> WordSenses are equivalent within one Synset. But for some practical
> uses (which you agreed to exist for WordSenses) they are not. 
>
> > Therefore, 'annotating' a document with the thesaurus T
> term <basalt> is semantically equivalent to 'annotating' the
> document with the thesarus T term <rock>.  Therefore, there's 
> no point in doing it.
>
> Would someone using that thesaurus agree that <basalt> and <rock> are
> equivalent?
>
> > If you want to say something more specific, using a 
> thesaurus, then you need a thesaurus that has <basalt> as a
> preferred term.
>
> But if there isn't any?
>
> > Alternatively, use free text keyword annotations.
>
> Note that I'm referring to use cases other than annotation for
> document retrieval, for which I agree you should annotate with the
> concept, not the term.
>
> > The words 'rock' and 'basalt' may have quite different 
> meanings to you when used in natural language discourse, but
> that is completely irrelevant.  The word 'rock', and thesarus
> T term <rock>, are entirely separate entities.
> >
> > 
> >>A more probable/useful scenario is that a prefterm in one
> >>language is mapped to
> >>a nonpref term in another, because it is a more accurate
> >>translation of the
> >>word. It enables a more finegrained mapping than just between
> >>concepts.
> >
> >
> > If you are talking about semantic mapping, then whether you
> choose thesaurus T term <rock> or thesaurus T term <basalt> 
> as your mapping target makes no difference to the meaning of
> the mapping, because thesaurus T term <rock> and thesaurus T
> term <basalt> are semantically equivalent tokens.  Therefore,
> if you are talking about semantic mapping, it is not possible
> to create a 'more fine-grained mapping' than that which is
> possible by mapping between the concepts.
>
> Not on the concept level, but it is possible on the term level? 
>
> What is wrong with stating that prefTerm A in language X is usually
> displayed/used in texts/... in language Y with nonPrefTerm B?
> It gives
> you additional information that you are free to ignore, because the 
> concept-to-concept mappings are implied by term-to-term mappings
> (well, if you define your mapping vocabulary in that way). It
> may help
> e.g. in translation or displays.
>
> Maybe this is not extremely useful, but I don't see anything 
> fundamentally wrong with it, either.
>
> >>A first use is if you are really interested in that specific
> >>term instead of its
> >>synonyms. For example if you want to count the number of 
> >>times a certain concept
> >>is misspelled. Or counting the # occurences of a specific term.
> >
> >
> > How can you misspell a 'concept'?  What are you counting
> exactly?  What do you mean by an 'occurrence of a specific term'? 
>
> A concept cannot be misspelled because it is nameless. You are
> counting the terms, not the concept.
>
> > N.B. A word, or collocations of words, that appears in a
> natural language document, and a thesaurus term that shares 
> an identical character sequence, are entirely separate
> entities.  The fact that they share an identical character
> sequence allows you to infer absolutely nothing at all.
>
> Why not? Of course you may need to assume that the meaning of 
> term and
> word overlap, but I think that programmers might just do that.
>
> > Am I making any sense?
>
> I can see perfectly clear where you're coming from, and my use cases
> may turn out to be complete DB after all, but I do think that people
> would try to (ab)use a thesaurus in all kinds of ways, and would not
> be wrong in doing so. These are just additional arguments on top of 
> the "we need a Term class to attach properties to" argument (which is
> probably a more compelling argument). And, if we do introduce a Term
> class, they are possible uses which we cannot prohibit. 
>
> Cheers,
> Mark.
>
> --
>   Mark F.J. van Assem - Vrije Universiteit Amsterdam
>         mark@cs.vu.nl - http://www.cs.vu.nl/~mark
>






-- 
Sue Ellen Wright
Institute for Applied Linguistics
Kent State University
Kent OH 44242 USA
sellenwright@gmail.com
swright@kent.edu
sewright@neo.rr.com 
Received on Tuesday, 1 November 2005 18:49:18 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:38:54 GMT