Feedback from Stella Dextre-Clarke on SWAD-Europe Thesaurus activ ity

Forwarding this to the list, feedback from Stella Dextre-Clarke, who is
involved in drafting of new British Standards for thesauri.

> -----Original Message-----
> From: Stella Dextre Clarke [mailto:sdclarke@lukehouse.demon.co.uk] 
> Sent: 12 January 2004 09:50
> To: Miles, AJ (Alistair) 
> Cc: Leonard Will (Leonard Will); 'Alan Gilchrist'
> Subject: RE: SWAD-Europe Thesaurus activity
> 
> 
> Alistair,
> Please accept my apologies for the long delay in replying. 
> First of all I was too tied-up with other things; then I 
> thought I'd wait until after our standards Working Group had 
> its meeting (6 Jan) and send you a joint response. WE did 
> have that meeting, and the good news is, we made a lot of 
> progress with all the corrections BSI has made to our drafts, 
> to get them into BSI housestyle. We now expect the documents 
> (i.e. Parts 1 and
> 2) to emerge in March as Drafts for Public Comment.
> 
> The bad news is, Parts 1 and 2 took up most of the day and we 
> did not have time for the Group to consider the SWAD papers 
> properly.  So I will just try to give you a few personal 
> comments on the work in progress.
> 
> Firstly, it is very impressive to see how much is being done 
> - keep up the good work!
> 
> Re the SKOS-mapping document, I liked the general approach, 
> which has a lot in common with our draft of Part 4 of the 
> standard ( this is the Part that deals with mappings). Some 
> matters of detail may need sorting out. For example, the 
> property "mappingRelation" seems to be defined (or at least 
> described) in terms of itself. In our standard, by the way, 
> we differentiate between inter-term "mappings" and 
> "relationships" by using the former term for relationships 
> between terms in different vocabularies. (Thus all mappings 
> are relationships, but we try to use the term "relationships" 
> when they apply within one vocabulary and "mappings" for 
> cross-vocabulary relationships. What we want to avoid is the 
> sort of loose chatter where people talk about a mapping when 
> all they mean is a USE/UF relationship inside one thesaurus.)
> 
> I thought that specifying "more than 50%" or "less than 50%" 
> (in a set of indexed resources) as the distinction between 
> major and minor matches has the benefit of pragmatism (i.e. I 
> like it in principle) but some problems in practice. It can 
> only apply in the context of a particular indexed collection, 
> and the benefit is that you get a measure of how good the 
> mapping will be for that collection. But a problem arises 
> when the collection grows, and something that matched for 80% 
> of the resources initially, now only matches for 30% of the 
> resources. It means you have to make regular checks on all 
> the major/minor matches to see if they are still valid - even 
> though the concepts themselves have not changed.
> 
> RE the SKOS-Core document, this seems to be setting up 
> definitions for a series of terms, and I am a little 
> concerned that the terms/definitions being established in 
> your group may differ from those in our standard, which we 
> hope will be adopted internationally (in the longer run). In 
> some cases the definitions are compatible with each other; in 
> other cases there is a real difference of usage.  For 
> example, I am not sure I have understood the difference made 
> in the SWAD document between the property "prefLabel" and the 
> property "descriptor", since the former property seems to be 
> exactly what our standard means by "descriptor" i.e. the 
> unique name by which a concept should be labelled. We use the 
> term "non-descriptor" for any alternative (non-preferred) 
> name for the same concept. To take one of the examples in the 
> SWAD document, "Orange (fruit)" could be a descriptor or a 
> non-descriptor, depending on how it is established in the 
> thesaurus. Spelling this out a little, in Thesaurus A, we 
> might have an entry "Orange (fruit) BT Citrus fruits", 
> indicating that both of these terms are descriptors. In 
> Thesaurus B we might have an entry "Orange (fruit) USE 
> Oranges", indicating that the former term is a 
> non-descriptor. It goes without saying that all the terms in 
> a thesaurus, whether descriptors or non-descriptors, have to 
> be unique. I was not quite clear, studying the SWAD document, 
> whether "descriptor" could also be used for the things that 
> our standard calls "non-descriptors" - which would be 
> unfortunate!  Sorry I have made rather a meal of this 
> example, but I am just wondering how we could proceed so that 
> there are no real incompatibilities between the terminologies 
> used in the SWAD work, and those in the thesaurus standard.
> 
> Incidentally, I hope to have a cleaned-up version of our 
> definitions in the next few days. Would you like a copy? (The 
> difference between them and those in the draft I sent you 
> before are only cosmetic - the application of BSI's 
> house-style - but still, differences can cause problems if 
> one is not aware of them.)
> 
> Another thing that concerns me is the class "Facet". The SWAD 
> document states that "a concept may be a member of only one 
> facet". I find myself split on this one, because I agree that 
> ideally, facets should be mutually exclusive. But in 
> practice, many thesauri which claim to follow the principles 
> of facet analysis (and this is one of the principles) do not 
> always achieve the ideal.  Some facets commonly used in 
> thesauri include Activities, Agents, Objects, Materials, 
> Organisms, Places, Times. Normally, a concept that belongs to 
> one of these facets cannot belong to any of the others, 
> because they are such fundamentally different things. But in 
> practice, a few concepts can occur that it is convenient to 
> assign to more than one facet. For example, biotechnology has 
> allowed us to develop some special organisms that may be used 
> as materials. Sometimes it is arguable whether a given 
> descriptor represents an object or a material. Or a material 
> such as a chemical reagent may be thought of as an agent 
> (although most agents are people or organisations). You could 
> argue that this problem occurs only because the facets have 
> been badly chosen in the first place. But I argue ( I am a 
> pragmatist) that in the real thesauri one encounters in 
> particular contexts, facets may have been chosen because they 
> are useful in the given context, and not for their 
> theoretical properties. Occasionally, therefore, concepts 
> will crop up that have been assigned to more than one facet. 
> What I am trying to warn is that, even though the ideal is 
> still as stated above, practical applications have to be 
> built in such a way that they will not break when the 
> exceptions crop up.  
> 
> I must stop getting excited about every detail! I should 
> address your question about a "standard interface for a 
> thesaurus service". AS you can see in the draft of Part 2 
> which I sent you ( I hope you did receive
> it?) we do say quite a bit about the functionality required 
> in the interfaces for (a) using a thesaurus for retrieval, 
> and (b) maintaining a thesaurus. Is that what you mean? WE do 
> not specify details of the interface - just the 
> functionality, and in quite a permissive way, to allow added 
> features. As to the data exchanges that support the 
> interface, formats and protocols will be specified in Part 5. 
> (We have not done any work on Part 5, but we hope it will 
> reflect the contents of Parts 1-4 and borrow heavily from the 
> work done by teams such as your own. So the more we can align 
> work across the community, the better.)
> 
> On another matter, your Links page invites contributions and 
> I wondered whether you would like to make reference to the 
> GCL at http://www.govtalk.gov.uk/schemasstandards/gcl.asp 
> That is the address of the online version. There are copies 
> freely available for downloading at 
> http://www.govtalk.gov.uk/schemasstandards/gcldocuments.asp 
> Strictly speaking, the GCL is a taxonomy rather than a 
> thesaurus, but I note that the page uses the term "thesauri" 
> to include quite a lot of other vocabularies (e.g. LCSH, DDC) 
> that are not thesauri, so I think you could put the GCL in 
> with the other "thesauri".
> 
> Please do keep in touch, Alistair, and let us know if you see 
> any opportunities for joint action.
> 
> Best wishes for 2004,
> Stella
> 
> *****************************************************
> Stella Dextre Clarke
> Information Consultant
> Luke House, West Hendred, Wantage, Oxon, OX12 8RR, UK
> Tel: 01235-833-298
> Fax: 01235-863-298
> SDClarke@LukeHouse.demon.co.uk
> *****************************************************
> 
> 
> 
> -----Original Message-----
> From: Miles, AJ (Alistair) [mailto:A.J.Miles@rl.ac.uk] 
> Sent: 21 November 2003 11:57
> To: Stella Dextre Clarke (E-mail)
> Subject: SWAD-Europe Thesaurus activity
> 
> 
> Hi Stella,
> 
> Just to send you an update on the SWAD-Europe thesaurus work. 
>  The current work is all written up on the web site [1]. 
> 
> The RDF formats for thesaurus data are maturing, and there 
> will be some reports in the next month or so covering things 
> like representing multilingual data, inter-thesaurus mapping 
> and thesaurus change and version control.  We're also looking 
> at making interoperability between thesauri and web 
> ontologies, taxonomies and other KOS happen.  
> 
> At the moment we're talking about doing this like defining 
> the RDF semantic-relations in relation to published standards 
> (to avoid ambiguities with things like 'broader') so it would 
> be good to stay in touch with the development of new British 
> standards for thesaurus structure.  
> 
> A last question, we are working on a web service API for a 
> terminology service.  Does your new standard cover things 
> like a standard interface to a thesaurus service?
> 
> Yours,
> 
> Alistair.
> 
> [1] SWAD-Europe Thesaurus Activity 
> <http://www.w3c.rl.ac.uk/SWAD/thesaurus.html>
> [2] Semantic 
> Web Advanced Development for Europe project 
> <http://www.w3.org/2001/sw/Europe/>
> 
> 
> CCLRC - Rutherford 
> Appleton Laboratory
> Building R1 Room 1.60
> Fermi Avenue
> Chilton
> Didcot
> Oxfordshire OX11 0QX
> United Kingdom
> 
> Email:        a.j.miles@rl.ac.uk
> Telephone: +44 (0)1235 445440
> 
> 

Received on Wednesday, 25 February 2004 07:06:07 UTC