W3C home > Mailing lists > Public > semantic-web@w3.org > November 2005

Re: RDF-Entailment: Remove duplicate anonymous resources - looking for an algorithm

From: Reto Bachmann-Gmür <reto@gmuer.ch>
Date: Mon, 21 Nov 2005 12:10:25 +0100
Message-ID: <4381AB21.10700@gmuer.ch>
To: Andreas Andreakis <andreas.andreakis@gmx.de>
CC: semantic-web at W3C <semantic-web@w3c.org>

Hi Andreas.

> may I ask, why you are facing this particular problem ? Maybe many 
> people hava access to a shared RDF Ontology ?

The project I'm working at is KnoBot (http://wymiwyg.org/knobot) an 
RDF-based aggregator and blog-publishing tool.

> I´m asking because your post is an example for the running thread: 
> "Why Literals should be unique and why this is a serious issue".

Of course IFPs and combined ifps 
(http://eulersharp.sourceforge.net/2004/04test/rogier#) can help keeping 
a model lean, and allow of course to detect more identities of resources 
than simple rdf-entailment. So my question is, does anybody have a 
generic algorithm, which is reasonable fast on real world rdf-graphs, 
which produces  for every graph g the smallest graph g1, so that g 
entails g1 and g1 entails g.

>
> In an algorithm for your problem you must annotate Properties, which 
> should identify resources and the algorithm must only consider those. 
> This is major, because sometimes people describe also properties, 
> which can occur with the same value on multiple resources. In your 
> first example for instance, it could be that two indenpendent 
> resources have the same label "test", this is a common case.

could be, however given my example graph with three triples there is 
nothing which make _:a and _.b different.

> Be also aware of your second example. The case you describe is 
> error-prone, since you can´t know if a,b = c,d , maybe "a", "b", "c" 
> and "d" are totally different people in your FOAF Ontology.

There could be thousands of people in the world which make the model

g1:
_:a foaf:knows _:b.
_:b foaf:knows _:a.

true, but there is no possible world in which g1 has another truth value 
than:

g2:
_:a foaf:knows _:b.
_:b foaf:knows _:a.
_:c foaf:knows _:d.
_:d foaf:knows _:c.

However, while in every world in which

g3:
_:a foaf:knows _:a

is true g1 and g2 are also true, there are possible worlds in which 
g1/g2 are true but not g3 (a world with two persons knowing each other 
but not knowing them self - know thyself, make g3 true ;-)

regards,
reto

>
>
> A solution to your problem and without the need to write an algorithm, 
> is to use OWL Full with InverserFuncProperties, see: 
> http://www.w3.org/TR/2004/REC-owl-guide-20040210/#InverseFunctionalProperty 
> . You can define properties as Resource-Identifing and thus multiple 
> Instances having the same value over an InvFuncProperty will not be 
> possible.
>
> The drawbacks of this suggestion:
> 1) refactoring of your ontology
> 2) you also need an Implementation, which haldels 
> InverseFunctionalProperties. If you are using Java the Jena-API is 
> solution
> 3) Inferencing performance will be poor if you dont choose a serious 
> OWL Reasoner, take: Pellet if you use Jena
> 4) Performance will be unnecessary slow if you dont deactivate OWL 
> Semantics you dont need (you should only use InverseFunctProperties, 
> look in the jena-docs or ask in the jena-mailinglist for this)
>
>
> best regards,
> Andreas
>
>
> Reto Bachmann-Gmür schrieb:
>
>>
>> Hello
>>
>> I'm looking for an algorithm to remove duplicate anonymous resources 
>> from an RDF-Model.
>>
>> Fo example, given the graph:
>>
>> (1) _:a rdfs:label "test".
>> (2) _:a rdfs:comment "...".
>> (3) _:b rdfs:label "test".
>>
>> The algorithm should tell me that I can safely drop statement 3.
>>
>> Or in a more complex example:
>>
>> (1) _:a foaf:knows _:b.
>> (2) _:b foaf:knows _:a.
>> (3) _:c foaf:knows _:d.
>> (4) _:d foaf:knows _:c.
>>
>> Tell me that 3 and 4 can be removed, but that the reduced graph 
>> cannot further be reduced, even if _:a and _:b seem indistinguishable.
>>
>> Does anybody has an algorithm or even some code for doing this?
>>
>> Thanks,
>> reto
>>
>>
>>
>>
>
>
Received on Monday, 21 November 2005 11:10:33 UTC

This archive was generated by hypermail 2.4.0 : Thursday, 24 March 2022 20:40:57 UTC