W3C home > Mailing lists > Public > public-esw-thes@w3.org > October 2005

RE: notes at contepts vs notes at terms

From: Miles, AJ \(Alistair\) <A.J.Miles@rl.ac.uk>
Date: Wed, 26 Oct 2005 16:38:36 +0100
Message-ID: <677CE4DD24B12C4B9FA138534E29FB1D0ACE2F@exchange11.fed.cclrc.ac.uk>
To: "Mark van Assem" <mark@cs.vu.nl>
Cc: <public-esw-thes@w3.org>

Hi Mark,

> Note that I'm referring to use cases other than annotation for 
> document retrieval, for which I agree you should annotate with the 
> concept, not the term.

Can you please describe these use cases in detail, explaining in each case exactly what it is you want to be able to assert, what those assertions would mean, and what exactly is the nature of the resources involved in those assertions.  

> These are just additional arguments on top of 
> the "we need a Term class to attach properties to" argument 

What are these properties?  Please list, with an explanation of the meaning of any assertions made using them.

Fwiw ...

'Term' is the most hideous word.  It means a million different things to a million different people.  A 'term' from a controlled vocabulary, and a 'term' from a terminology are *completely different things* [1][2].  In metadata applications, 'terms' can be properties of things, or values of those properties, or classes of things, or meaningless strings, or all of the above - cf. the 'Dublin Core Metadata Terms' [3]. The SKOS Core Vocabulary Specification [4] uses 'term' to refer to the classes and properties of the SKOS Core Vocabulary itself, a usage that is consistent with Dublin Core and other RDF documentation.

Because of this incredibly overloaded usage in overlapping fields of discourse, the SKOS Core Guide [5] contains virtually no occurrences of the character string 'term' in prose.  This is *very* deliberate.  (I just found a couple that slipped through, doh.)

The lesson Dublin Core folks have learned is: be precise.  The meaning of several of the properties of the dublin core element set is now so overloaded in practice as to render them effectively meaningless.  This is a huge problem for the DCMI architecture and usage teams.

If we were to coin a class 'Term' for SKOS Core, I'm quite certain that the incredible variation that would be found in its practical usage would render it, and all the associated parts of SKOS Core, effectively meaningless.  We would be contributing confusion to an already very confused field of discourse.

Bottom line: If you can define a class of resources that isn't called 'Term', whose meaning is clear and easily defined, whose application is straightforward and unambiguous, and whose supporting use cases can be justified by a significant body of practice, then great, let's talk about it.

If you can't, think outside the box.  Think about n-ary relations.  If you're finding it hard to define the nature (i.e. type) of the things you're trying to relate, perhaps you're conflating resources.  Perhaps what you understand as a 'thesaurus term' is actually an instance of an n-ary relationship between several things.  If you don't like n-ary relations, make an effort to differentiate what you mean by the word 'term' in all the different contexts in which you use it, then start defining classes from there.  I'll bet you end up with about 12 classes, almost all of which are disjoint.



[1] http://lists.w3.org/Archives/Public/public-esw-thes/2005Oct/0114.html
[2] http://lists.w3.org/Archives/Public/public-esw-thes/2005Oct/0085.html
[3] http://dublincore.org/documents/dcmi-terms/
[4] http://www.w3.org/TR/2005/WD-swbp-skos-core-spec-20050510/
[5] http://www.w3.org/TR/2005/WD-swbp-skos-core-guide-20050510/

> -----Original Message-----
> From: Mark van Assem [mailto:mark@cs.vu.nl]
> Sent: 26 October 2005 12:01
> To: Miles, AJ (Alistair)
> Cc: public-esw-thes@w3.org
> Subject: Re: notes at contepts vs notes at terms
> Hi Alistair,
> > I don't know how to say this without sounding like an arse 
> ... but I'm pretty sure that what you're suggesting 
> contradicts the basic principles of thesaurus construction 
> and use, as I've learned them from ISO 2788, the new BS 8723, 
> and directly from folks like Stella and Leonard.
> Probably you're right, but I think that some of the thesaurus 
> folk are 
> in favour of having a Term class for the reason of attaching 
> properties to them. The result is that you can have URIs for 
> them, and 
> use the terms in the ways I suggest. And I guess that if people find 
> those useful, they *will*, no matter what any standard is saying. And 
> I don't think they would be wrong in doing so.
> > ... then thesaurus T term <rock> and thesaurus T term 
> <basalt> are semantically equivalent tokens.
> Yep, in the thesaurus they are, just like (I think) in WN the 
> WordSenses are equivalent within one Synset. But for some practical 
> uses (which you agreed to exist for WordSenses) they are not.
> > Therefore, 'annotating' a document with the thesaurus T 
> term <basalt> is semantically equivalent to 'annotating' the 
> document with the thesarus T term <rock>.  Therefore, there's 
> no point in doing it.
> Would someone using that thesaurus agree that <basalt> and <rock> are 
> equivalent?
> > If you want to say something more specific, using a 
> thesaurus, then you need a thesaurus that has <basalt> as a 
> preferred term.
> But if there isn't any?
> > Alternatively, use free text keyword annotations.
> Note that I'm referring to use cases other than annotation for 
> document retrieval, for which I agree you should annotate with the 
> concept, not the term.
> > The words 'rock' and 'basalt' may have quite different 
> meanings to you when used in natural language discourse, but 
> that is completely irrelevant.  The word 'rock', and thesarus 
> T term <rock>, are entirely separate entities.
> > 
> > 
> >>A more probable/useful scenario is that a prefterm in one 
> >>language is mapped to
> >>a nonpref term in another, because it is a more accurate 
> >>translation of the
> >>word. It enables a more finegrained mapping than just between 
> >>concepts.
> > 
> > 
> > If you are talking about semantic mapping, then whether you 
> choose thesaurus T term <rock> or thesaurus T term <basalt> 
> as your mapping target makes no difference to the meaning of 
> the mapping, because thesaurus T term <rock> and thesaurus T 
> term <basalt> are semantically equivalent tokens.  Therefore, 
> if you are talking about semantic mapping, it is not possible 
> to create a 'more fine-grained mapping' than that which is 
> possible by mapping between the concepts.
> Not on the concept level, but it is possible on the term level?
> What is wrong with stating that prefTerm A in language X is usually 
> displayed/used in texts/... in language Y with nonPrefTerm B? 
> It gives 
> you additional information that you are free to ignore, because the 
> concept-to-concept mappings are implied by term-to-term mappings 
> (well, if you define your mapping vocabulary in that way). It 
> may help 
> e.g. in translation or displays.
> Maybe this is not extremely useful, but I don't see anything 
> fundamentally wrong with it, either.
> >>A first use is if you are really interested in that specific 
> >>term instead of its
> >>synonyms. For example if you want to count the number of 
> >>times a certain concept
> >>is misspelled. Or counting the # occurences of a specific term.
> > 
> > 
> > How can you misspell a 'concept'?  What are you counting 
> exactly?  What do you mean by an 'occurrence of a specific term'?
> A concept cannot be misspelled because it is nameless. You are 
> counting the terms, not the concept.
> > N.B. A word, or collocations of words, that appears in a 
> natural language document, and a thesaurus term that shares 
> an identical character sequence, are entirely separate 
> entities.  The fact that they share an identical character 
> sequence allows you to infer absolutely nothing at all.
> Why not? Of course you may need to assume that the meaning of 
> term and 
> word overlap, but I think that programmers might just do that.
> > Am I making any sense?
> I can see perfectly clear where you're coming from, and my use cases 
> may turn out to be complete DB after all, but I do think that people 
> would try to (ab)use a thesaurus in all kinds of ways, and would not 
> be wrong in doing so. These are just additional arguments on top of 
> the "we need a Term class to attach properties to" argument (which is 
> probably a more compelling argument). And, if we do introduce a Term 
> class, they are possible uses which we cannot prohibit.
> Cheers,
> Mark.
> -- 
>   Mark F.J. van Assem - Vrije Universiteit Amsterdam
>         mark@cs.vu.nl - http://www.cs.vu.nl/~mark
Received on Wednesday, 26 October 2005 15:38:55 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 2 March 2016 13:32:06 UTC