Why you can't use OWL set constructs directly for concept mapping - the simple version !!!

Let's have another go at this ...

Several people have asked, why can't I encode a traditional mapping
expression such as ...

	'Concept A has-broad-match Concept B and Concept C'

... using OWL as ...

<soks:Concept rdf:about="#A">
	<rdfs:subClassOf rdf:parseType="resource">
		<owl:intersectionOf rdf:parseType="collection">
  			<soks:Concept rdf:about="#B"/>
			<soks:Concept rdf:about="#C"/>
		</owl:intersectionOf>
	</rdfs:subClassOf>
</soks:Concept>

... ???

Two reasons are:

1.  Many things types as <soks:Concept> should not be modelled as classes.
For example, the concept 'Fish and chips' is not a class, nor the concept
'Java programming language version 1.4.2' or 'Absence from school' or
'British Army' or 'National Health Service'.  But all these may be concepts
in a thesaurus (the last three are taken from the Government Category List
thesaurus).  And if they aren't classes, we can't use OWL set constructs,
because these may only be used with classes.

However, we still want to be able to use set-like expressions, such as 

'Java programming language version 1.4.2' has-broad-match 'Java
(programming)' and 'Programming languages' 


2.  Some thesaurus concepts can be modelled as classes.  For example, 'NHS
Trusts', 'Nursery schools', 'Religious minorities' (all from GCL) could all
be modelled as an <owl:Class> with instances.  Here we want to reserve all
the OWL expressions to make statements about the structure of the conceptual
scheme itself, should we decide to start migrating our thesauri towards
ontologies.    


HOWEVER, I believe we can still use OWL to express the entailments of
mappings.  I believe a statement such as ...

	'Concept A has-broad-match Concept B and Concept C'

... could be defined to entail the following ...

 <owl:Restriction> 
    <owl:onProperty rdf:resource="&dc;subject"/> 
    <owl:hasValue rdf:resource="#A"/> 
    <rdfs:subClassOf rdf:parseType="resource"> 
       <owl:intersectionOf rdf:parseType="collection"> 
          <owl:Restriction> 
             <owl:onProperty rdf:resource="&dc;subject"/> 
             <owl:hasValue rdf:resource="#B"/> 
          </owl:Restriction> 
          <owl:Restriction> 
             <owl:onProperty rdf:resource="&dc;subject"/> 
             <owl:hasValue rdf:resource="#C"/> 
          </owl:Restriction> 
       </owl:intersectionOf> 
    </rdfs:subClassOf> 
 </owl:Restriction> 

... which is useful because then an OWL reasoner could infer that a
something with <dc:subject> concept A is also something with <dc:subject>
concept B and <dc:subject> concept B.  I.e. given a complete mapping an OWL
reasoner could infer a virtual subject index for a collection in terms of
another thesaurus (although perhaps at some possibly unreasonable
computational expense).  An OWL reasoner could also infer transformed
queries.  

The problem with using this sort of expression as a base vocabulary for
expressing mappings is ...
	1.  It's a bit long-winded and ...
	2.  Some collections may use properties other than <dc:subject> to
express the subject of a resource (e.g. Open Directory uses the 'link'
property from a topic to a resource, which is a bit like a sort of inverse
of <dc:subject>).

So I think a simpler base vocabulary is in order, something familiar to the
thesaurus community, hence SKOS-Mapping.  Then SKOS-Mapping statements could
be defined to have formal entailments expressable in OWL, or they could be
used to generate rules, which could in turn be used to transform queries or
subject indexes.

Defining <soks-map:AND> <soks-map:OR> and <soks-map:NOT> as sub-classes of
<rdf:Bag> may not be a good idea however (see Dave R's mail
<http://lists.w3.org/Archives/Public/public-esw-thes/2003Nov/0065.html>), so
I'm totally open to suggestions on how best to express these kinds of
construct in RDF ...

Al.

[Also you can see my other attempt to write up this issue
<http://lists.w3.org/Archives/Public/public-esw-thes/2003Nov/0067.html>]


   


> -----Original Message-----
> From: public-esw-thes-request@w3.org
> [mailto:public-esw-thes-request@w3.org]On Behalf Of Danny Ayers
> Sent: 02 December 2003 09:38
> To: public-esw-thes@w3.org
> Subject: RE: Interpretation of SKOS-Mapping properties ...
> 
> 
> 
> Hi Alistair, folks,
> 
> I think Steve's got a good point - using the exact figure 50% in the
> definitions could mislead people into thinking it had more 
> significance than
> an expression of the mappers belief. I'd suggest a minor change to the
> wording, from:
> 
> > > Instructions to mapper:
> > > Use <soks:majorMatch> to link concepts A and B if they overlap
> > in meaning,
> > > and if you believe that more than 50% of the documents 
> that are about
> > > concept A will also be about concept B.
> 
> to something like
> 
> "...if you believe that the majority of the documents..."
> 
> Generally developments are looking great to me - that was 
> quite a flurry of
> activity!
> 
> The combination constructs are interesting. I've not had a 
> chance to play
> yet, but I'd be a little concerned that they seem to be extending the
> semantics beyond RDF, and that there might be a 
> clash/incompatibility with
> OWL constructs. Any particular reason for not using OWL, btw?
> 
> Cheers,
> Danny
> 
> 
> 
> > -----Original Message-----
> > From: public-esw-thes-request@w3.org
> > [mailto:public-esw-thes-request@w3.org]On Behalf Of Steve Cayzer
> > Sent: 01 December 2003 15:13
> > To: public-esw-thes@w3.org
> > Cc: public-esw-thes@w3.org
> > Subject: Re: Interpretation of SKOS-Mapping properties ...
> >
> >
> >
> > Hmmm, I see what you're saying, but I don't see why ">50%" (with its
> > illusion of precision) is better than "good" or some such (with its
> > guarantee of vagueness).
> > What we are talking about here the formal encoding of imprecision :)
> >
> > Cheers
> >
> > Steve
> > ----- Original Message -----
> > From: "Miles, AJ (Alistair) " <A.J.Miles@rl.ac.uk>
> > To: "'Steve Cayzer'" <steve.cayzer@hp.com>
> > Cc: <public-esw-thes@w3.org>
> > Sent: Monday, December 01, 2003 12:16 PM
> > Subject: Interpretation of SKOS-Mapping properties ...
> >
> >
> > > Hi Steve,
> > >
> > > > 3).
> > > > The major thing I wanted to post to the list is this (but you
> > > > may be able to
> > > > answer it directly?)
> > > > I notice that on
> > > > http://www.w3c.rl.ac.uk/2003/11/21-skos-mapping
> > > > has the following properties:
> > > >
> > > > <rdf:Property rdf:ID="majorMatch">
> > > > <rdfs:comment>If 'concept A has-major-match concept B' 
> then the set of
> > > > resources properly indexed against concept A shares more than
> > > > 50% of its
> > > > members with the set of resources properly indexed 
> against concept
> > > > B.</rdfs:comment>
> > > > </rdf:Property>
> > > >
> > > > <rdf:Property rdf:ID="minorMatch">
> > > >   <rdfs:comment>If 'concept A has-minor-match concept B' then
> > > > the set of
> > > > resources properly indexed against concept A shares 
> less than 50% but
> > > > greater than 0 of its members with the set of resources
> > > > properly indexed
> > > > against concept B.</rdfs:comment>
> > > >     </rdf:Property>
> > > >
> > > > The use of some number (50%) rings warning bells in my mind.
> > > > What about
> > > > 49.7% vs 50.1% ? How do we know anyway?
> > > > A more comfortable definition (in my mind) would be 
> something vaguer
> > > > major match -> This means that a resource properly indexed
> > > > against A has a
> > > > good chance of being properly indexed against B
> > > > minor match -> This means that a resource properly indexed
> > > > against A has
> > > > some chance of being (or 'may be') properly indexed against B
> > >
> > > Good point.  This brings up a duality of perspective that I've
> > been trying
> > > to understand for a while.  Let's have a crack at explaining it...
> > >
> > > I have defined these properties with formal entailments, 
> i.e. majorMatch
> > > entails >50% overlap of the document sets corresponding 
> to the concepts.
> > > However, a person creating the mapping must make a best 
> guess as to
> > whether
> > > this will be true, based on their interpretation of the
> > different meanings
> > > of the concepts.
> > >
> > > To make this point another way, consider the following two sets of
> > > instructions on how to use the <soks:majorMatch> property, one
> > to a person
> > > creating a mapping, and one to a programmer developing 
> applications that
> > use
> > > the <soks:majorMatch> property ...
> > >
> > > Instructions to mapper:
> > > Use <soks:majorMatch> to link concepts A and B if they overlap
> > in meaning,
> > > and if you believe that more than 50% of the documents 
> that are about
> > > concept A will also be about concept B.
> > >
> > > Instructions to programmer:
> > > The ( <ConceptA> <soks:majorMatch> <ConceptB> ) statement 
> entails that
> > >50%
> > > of the documents properly indexed against concept A are 
> also properly
> > > indexed against concept B.  Thus in a query the two 
> concepts may be
> > > interchanged, and a success rate of >50% may be expected.
> > >
> > > I.e. the mapper makes a best guess based on the meaning 
> of the concepts,
> > > with imperfect knowledge of the actual document sets, and 
> the programmer
> > > writes programs that process these statements as if they are true
> > statements
> > > about the world, made by someone with perfect knowledge 
> of the document
> > > sets.
> > >
> > > I think it's worth bearing in mind what actual impact 
> these different
> > > mapping statements will have to the user.  A good mapping will
> > mean that a
> > > query app processing transformed queries can guarantee 
> complete recall,
> > and
> > > order the result set to put better matches first.  A poor 
> mapping means
> > lots
> > > of bogus results, incomplete recall and no good ordering. 
>  In order to
> > > generate a good mapping, the mapper needs the right tools 
> (i.e. a well
> > > designed vocab) and must know how to use them (i.e. needs 
> a clear set of
> > > instructions).  So this is what we're working towards.
> > >
> > > How does that go down?
> > >
> > > Al.
> > >
> >
> 

Received on Tuesday, 2 December 2003 11:31:23 UTC