Re: RDF-Entailment: Remove duplicate anonymous resources - looking for an algorithm

Hi Andreas.

> may I ask, why you are facing this particular problem ? Maybe many 
> people hava access to a shared RDF Ontology ?

The project I'm working at is KnoBot ( an 
RDF-based aggregator and blog-publishing tool.

> I´m asking because your post is an example for the running thread: 
> "Why Literals should be unique and why this is a serious issue".

Of course IFPs and combined ifps 
( can help keeping 
a model lean, and allow of course to detect more identities of resources 
than simple rdf-entailment. So my question is, does anybody have a 
generic algorithm, which is reasonable fast on real world rdf-graphs, 
which produces  for every graph g the smallest graph g1, so that g 
entails g1 and g1 entails g.

> In an algorithm for your problem you must annotate Properties, which 
> should identify resources and the algorithm must only consider those. 
> This is major, because sometimes people describe also properties, 
> which can occur with the same value on multiple resources. In your 
> first example for instance, it could be that two indenpendent 
> resources have the same label "test", this is a common case.

could be, however given my example graph with three triples there is 
nothing which make _:a and _.b different.

> Be also aware of your second example. The case you describe is 
> error-prone, since you can´t know if a,b = c,d , maybe "a", "b", "c" 
> and "d" are totally different people in your FOAF Ontology.

There could be thousands of people in the world which make the model

_:a foaf:knows _:b.
_:b foaf:knows _:a.

true, but there is no possible world in which g1 has another truth value 

_:a foaf:knows _:b.
_:b foaf:knows _:a.
_:c foaf:knows _:d.
_:d foaf:knows _:c.

However, while in every world in which

_:a foaf:knows _:a

is true g1 and g2 are also true, there are possible worlds in which 
g1/g2 are true but not g3 (a world with two persons knowing each other 
but not knowing them self - know thyself, make g3 true ;-)


> A solution to your problem and without the need to write an algorithm, 
> is to use OWL Full with InverserFuncProperties, see: 
> . You can define properties as Resource-Identifing and thus multiple 
> Instances having the same value over an InvFuncProperty will not be 
> possible.
> The drawbacks of this suggestion:
> 1) refactoring of your ontology
> 2) you also need an Implementation, which haldels 
> InverseFunctionalProperties. If you are using Java the Jena-API is 
> solution
> 3) Inferencing performance will be poor if you dont choose a serious 
> OWL Reasoner, take: Pellet if you use Jena
> 4) Performance will be unnecessary slow if you dont deactivate OWL 
> Semantics you dont need (you should only use InverseFunctProperties, 
> look in the jena-docs or ask in the jena-mailinglist for this)
> best regards,
> Andreas
> Reto Bachmann-Gmür schrieb:
>> Hello
>> I'm looking for an algorithm to remove duplicate anonymous resources 
>> from an RDF-Model.
>> Fo example, given the graph:
>> (1) _:a rdfs:label "test".
>> (2) _:a rdfs:comment "...".
>> (3) _:b rdfs:label "test".
>> The algorithm should tell me that I can safely drop statement 3.
>> Or in a more complex example:
>> (1) _:a foaf:knows _:b.
>> (2) _:b foaf:knows _:a.
>> (3) _:c foaf:knows _:d.
>> (4) _:d foaf:knows _:c.
>> Tell me that 3 and 4 can be removed, but that the reduced graph 
>> cannot further be reduced, even if _:a and _:b seem indistinguishable.
>> Does anybody has an algorithm or even some code for doing this?
>> Thanks,
>> reto

Received on Monday, 21 November 2005 11:10:33 UTC