- From: Miles, AJ (Alistair) <A.J.Miles@rl.ac.uk>
- Date: Mon, 1 Dec 2003 12:16:59 -0000
- To: 'Steve Cayzer' <steve.cayzer@hp.com>
- Cc: "'public-esw-thes@w3.org'" <public-esw-thes@w3.org>
Hi Steve, > 3). > The major thing I wanted to post to the list is this (but you > may be able to > answer it directly?) > I notice that on > http://www.w3c.rl.ac.uk/2003/11/21-skos-mapping > has the following properties: > > <rdf:Property rdf:ID="majorMatch"> > <rdfs:comment>If 'concept A has-major-match concept B' then the set of > resources properly indexed against concept A shares more than > 50% of its > members with the set of resources properly indexed against concept > B.</rdfs:comment> > </rdf:Property> > > <rdf:Property rdf:ID="minorMatch"> > <rdfs:comment>If 'concept A has-minor-match concept B' then > the set of > resources properly indexed against concept A shares less than 50% but > greater than 0 of its members with the set of resources > properly indexed > against concept B.</rdfs:comment> > </rdf:Property> > > The use of some number (50%) rings warning bells in my mind. > What about > 49.7% vs 50.1% ? How do we know anyway? > A more comfortable definition (in my mind) would be something vaguer > major match -> This means that a resource properly indexed > against A has a > good chance of being properly indexed against B > minor match -> This means that a resource properly indexed > against A has > some chance of being (or 'may be') properly indexed against B Good point. This brings up a duality of perspective that I've been trying to understand for a while. Let's have a crack at explaining it... I have defined these properties with formal entailments, i.e. majorMatch entails >50% overlap of the document sets corresponding to the concepts. However, a person creating the mapping must make a best guess as to whether this will be true, based on their interpretation of the different meanings of the concepts. To make this point another way, consider the following two sets of instructions on how to use the <soks:majorMatch> property, one to a person creating a mapping, and one to a programmer developing applications that use the <soks:majorMatch> property ... Instructions to mapper: Use <soks:majorMatch> to link concepts A and B if they overlap in meaning, and if you believe that more than 50% of the documents that are about concept A will also be about concept B. Instructions to programmer: The ( <ConceptA> <soks:majorMatch> <ConceptB> ) statement entails that >50% of the documents properly indexed against concept A are also properly indexed against concept B. Thus in a query the two concepts may be interchanged, and a success rate of >50% may be expected. I.e. the mapper makes a best guess based on the meaning of the concepts, with imperfect knowledge of the actual document sets, and the programmer writes programs that process these statements as if they are true statements about the world, made by someone with perfect knowledge of the document sets. I think it's worth bearing in mind what actual impact these different mapping statements will have to the user. A good mapping will mean that a query app processing transformed queries can guarantee complete recall, and order the result set to put better matches first. A poor mapping means lots of bogus results, incomplete recall and no good ordering. In order to generate a good mapping, the mapper needs the right tools (i.e. a well designed vocab) and must know how to use them (i.e. needs a clear set of instructions). So this is what we're working towards. How does that go down? Al.
Received on Monday, 1 December 2003 07:17:17 UTC