Interpretation of SKOS-Mapping properties ... from Miles, AJ (Alistair) on 2003-12-01 (public-esw-thes@w3.org from December 2003)

From: Miles, AJ (Alistair) <A.J.Miles@rl.ac.uk>
Date: Mon, 1 Dec 2003 12:16:59 -0000
To: 'Steve Cayzer' <steve.cayzer@hp.com>
Cc: "'public-esw-thes@w3.org'" <public-esw-thes@w3.org>
Message-ID: <350DC7048372D31197F200902773DF4C04944043@exchange11.rl.ac.uk>

Hi Steve,

> 3).
> The major thing I wanted to post to the list is this (but you 
> may be able to
> answer it directly?)
> I notice that on
> http://www.w3c.rl.ac.uk/2003/11/21-skos-mapping
> has the following properties:
> 
> <rdf:Property rdf:ID="majorMatch">
> <rdfs:comment>If 'concept A has-major-match concept B' then the set of
> resources properly indexed against concept A shares more than 
> 50% of its
> members with the set of resources properly indexed against concept
> B.</rdfs:comment>
> </rdf:Property>
> 
> <rdf:Property rdf:ID="minorMatch">
>   <rdfs:comment>If 'concept A has-minor-match concept B' then 
> the set of
> resources properly indexed against concept A shares less than 50% but
> greater than 0 of its members with the set of resources 
> properly indexed
> against concept B.</rdfs:comment>
>     </rdf:Property>
> 
> The use of some number (50%) rings warning bells in my mind. 
> What about
> 49.7% vs 50.1% ? How do we know anyway?
> A more comfortable definition (in my mind) would be something vaguer
> major match -> This means that a resource properly indexed 
> against A has a
> good chance of being properly indexed against B
> minor match -> This means that a resource properly indexed 
> against A has
> some chance of being (or 'may be') properly indexed against B

Good point.  This brings up a duality of perspective that I've been trying
to understand for a while.  Let's have a crack at explaining it...

I have defined these properties with formal entailments, i.e. majorMatch
entails >50% overlap of the document sets corresponding to the concepts.
However, a person creating the mapping must make a best guess as to whether
this will be true, based on their interpretation of the different meanings
of the concepts.  

To make this point another way, consider the following two sets of
instructions on how to use the <soks:majorMatch> property, one to a person
creating a mapping, and one to a programmer developing applications that use
the <soks:majorMatch> property ...

Instructions to mapper:
Use <soks:majorMatch> to link concepts A and B if they overlap in meaning,
and if you believe that more than 50% of the documents that are about
concept A will also be about concept B.

Instructions to programmer:
The ( <ConceptA> <soks:majorMatch> <ConceptB> ) statement entails that >50%
of the documents properly indexed against concept A are also properly
indexed against concept B.  Thus in a query the two concepts may be
interchanged, and a success rate of >50% may be expected.

I.e. the mapper makes a best guess based on the meaning of the concepts,
with imperfect knowledge of the actual document sets, and the programmer
writes programs that process these statements as if they are true statements
about the world, made by someone with perfect knowledge of the document
sets.

I think it's worth bearing in mind what actual impact these different
mapping statements will have to the user.  A good mapping will mean that a
query app processing transformed queries can guarantee complete recall, and
order the result set to put better matches first.  A poor mapping means lots
of bogus results, incomplete recall and no good ordering.  In order to
generate a good mapping, the mapper needs the right tools (i.e. a well
designed vocab) and must know how to use them (i.e. needs a clear set of
instructions).  So this is what we're working towards.

How does that go down?

Al.

Received on Monday, 1 December 2003 07:17:17 UTC