RE: Interpretation of SKOS-Mapping properties ... from Miles, AJ (Alistair) on 2003-12-01 (public-esw-thes@w3.org from December 2003)

From: Miles, AJ (Alistair) <A.J.Miles@rl.ac.uk>
Date: Mon, 1 Dec 2003 13:36:47 -0000
To: 'Steve Cayzer' <steve.cayzer@hp.com>, public-esw-thes@w3.org
Message-ID: <350DC7048372D31197F200902773DF4C04944044@exchange11.rl.ac.uk>
Yeah, good point.  Actually the idea for major and minor match (with the <>
50% definitions) came from the Hacet project report which I got a sneaky
preview of.  I though it was better than just having 'inexactMatch'.  Do you
suggest a name change (e.g. 'goodMatch' and 'notSoGoodMatch' :) or just a
different definition of major and minor?

Al.

> -----Original Message-----
> From: Steve Cayzer [mailto:steve.cayzer@hp.com]
> Sent: 01 December 2003 12:57
> To: public-esw-thes@w3.org
> Cc: public-esw-thes@w3.org
> Subject: Re: Interpretation of SKOS-Mapping properties ...
> 
> 
> 
> Hmmm, I see what you're saying, but I don't see why ">50%" (with its
> illusion of precision) is better than "good" or some such (with its
> guarantee of vagueness).
> What we are talking about here the formal encoding of imprecision :)
> 
> Cheers
> 
> Steve
> ----- Original Message ----- 
> From: "Miles, AJ (Alistair) " <A.J.Miles@rl.ac.uk>
> To: "'Steve Cayzer'" <steve.cayzer@hp.com>
> Cc: <public-esw-thes@w3.org>
> Sent: Monday, December 01, 2003 12:16 PM
> Subject: Interpretation of SKOS-Mapping properties ...
> 
> 
> > Hi Steve,
> >
> > > 3).
> > > The major thing I wanted to post to the list is this (but you
> > > may be able to
> > > answer it directly?)
> > > I notice that on
> > > http://www.w3c.rl.ac.uk/2003/11/21-skos-mapping
> > > has the following properties:
> > >
> > > <rdf:Property rdf:ID="majorMatch">
> > > <rdfs:comment>If 'concept A has-major-match concept B' 
> then the set of
> > > resources properly indexed against concept A shares more than
> > > 50% of its
> > > members with the set of resources properly indexed against concept
> > > B.</rdfs:comment>
> > > </rdf:Property>
> > >
> > > <rdf:Property rdf:ID="minorMatch">
> > >   <rdfs:comment>If 'concept A has-minor-match concept B' then
> > > the set of
> > > resources properly indexed against concept A shares less 
> than 50% but
> > > greater than 0 of its members with the set of resources
> > > properly indexed
> > > against concept B.</rdfs:comment>
> > >     </rdf:Property>
> > >
> > > The use of some number (50%) rings warning bells in my mind.
> > > What about
> > > 49.7% vs 50.1% ? How do we know anyway?
> > > A more comfortable definition (in my mind) would be 
> something vaguer
> > > major match -> This means that a resource properly indexed
> > > against A has a
> > > good chance of being properly indexed against B
> > > minor match -> This means that a resource properly indexed
> > > against A has
> > > some chance of being (or 'may be') properly indexed against B
> >
> > Good point.  This brings up a duality of perspective that 
> I've been trying
> > to understand for a while.  Let's have a crack at explaining it...
> >
> > I have defined these properties with formal entailments, 
> i.e. majorMatch
> > entails >50% overlap of the document sets corresponding to 
> the concepts.
> > However, a person creating the mapping must make a best guess as to
> whether
> > this will be true, based on their interpretation of the 
> different meanings
> > of the concepts.
> >
> > To make this point another way, consider the following two sets of
> > instructions on how to use the <soks:majorMatch> property, 
> one to a person
> > creating a mapping, and one to a programmer developing 
> applications that
> use
> > the <soks:majorMatch> property ...
> >
> > Instructions to mapper:
> > Use <soks:majorMatch> to link concepts A and B if they 
> overlap in meaning,
> > and if you believe that more than 50% of the documents that 
> are about
> > concept A will also be about concept B.
> >
> > Instructions to programmer:
> > The ( <ConceptA> <soks:majorMatch> <ConceptB> ) statement 
> entails that
> >50%
> > of the documents properly indexed against concept A are 
> also properly
> > indexed against concept B.  Thus in a query the two concepts may be
> > interchanged, and a success rate of >50% may be expected.
> >
> > I.e. the mapper makes a best guess based on the meaning of 
> the concepts,
> > with imperfect knowledge of the actual document sets, and 
> the programmer
> > writes programs that process these statements as if they are true
> statements
> > about the world, made by someone with perfect knowledge of 
> the document
> > sets.
> >
> > I think it's worth bearing in mind what actual impact these 
> different
> > mapping statements will have to the user.  A good mapping 
> will mean that a
> > query app processing transformed queries can guarantee 
> complete recall,
> and
> > order the result set to put better matches first.  A poor 
> mapping means
> lots
> > of bogus results, incomplete recall and no good ordering.  
> In order to
> > generate a good mapping, the mapper needs the right tools 
> (i.e. a well
> > designed vocab) and must know how to use them (i.e. needs a 
> clear set of
> > instructions).  So this is what we're working towards.
> >
> > How does that go down?
> >
> > Al.
> >
>
Received on Monday, 1 December 2003 08:36:51 UTC