# Re: RDF-Entailment: Remove duplicate anonymous resources - looking for an algorithm

```Hi Andreas.

> may I ask, why you are facing this particular problem ? Maybe many

The project I'm working at is KnoBot (http://wymiwyg.org/knobot) an
RDF-based aggregator and blog-publishing tool.

> "Why Literals should be unique and why this is a serious issue".

Of course IFPs and combined ifps
(http://eulersharp.sourceforge.net/2004/04test/rogier#) can help keeping
a model lean, and allow of course to detect more identities of resources
than simple rdf-entailment. So my question is, does anybody have a
generic algorithm, which is reasonable fast on real world rdf-graphs,
which produces  for every graph g the smallest graph g1, so that g
entails g1 and g1 entails g.

>
> In an algorithm for your problem you must annotate Properties, which
> should identify resources and the algorithm must only consider those.
> This is major, because sometimes people describe also properties,
> which can occur with the same value on multiple resources. In your
> first example for instance, it could be that two indenpendent
> resources have the same label "test", this is a common case.

could be, however given my example graph with three triples there is
nothing which make _:a and _.b different.

> Be also aware of your second example. The case you describe is
> error-prone, since you can´t know if a,b = c,d , maybe "a", "b", "c"
> and "d" are totally different people in your FOAF Ontology.

There could be thousands of people in the world which make the model

g1:
_:a foaf:knows _:b.
_:b foaf:knows _:a.

true, but there is no possible world in which g1 has another truth value
than:

g2:
_:a foaf:knows _:b.
_:b foaf:knows _:a.
_:c foaf:knows _:d.
_:d foaf:knows _:c.

However, while in every world in which

g3:
_:a foaf:knows _:a

is true g1 and g2 are also true, there are possible worlds in which
g1/g2 are true but not g3 (a world with two persons knowing each other
but not knowing them self - know thyself, make g3 true ;-)

regards,
reto

>
>
> A solution to your problem and without the need to write an algorithm,
> is to use OWL Full with InverserFuncProperties, see:
> http://www.w3.org/TR/2004/REC-owl-guide-20040210/#InverseFunctionalProperty
> . You can define properties as Resource-Identifing and thus multiple
> Instances having the same value over an InvFuncProperty will not be
> possible.
>
> The drawbacks of this suggestion:
> 1) refactoring of your ontology
> 2) you also need an Implementation, which haldels
> InverseFunctionalProperties. If you are using Java the Jena-API is
> solution
> 3) Inferencing performance will be poor if you dont choose a serious
> OWL Reasoner, take: Pellet if you use Jena
> 4) Performance will be unnecessary slow if you dont deactivate OWL
> Semantics you dont need (you should only use InverseFunctProperties,
> look in the jena-docs or ask in the jena-mailinglist for this)
>
>
> best regards,
> Andreas
>
>
> Reto Bachmann-Gmür schrieb:
>
>>
>> Hello
>>
>> I'm looking for an algorithm to remove duplicate anonymous resources
>> from an RDF-Model.
>>
>> Fo example, given the graph:
>>
>> (1) _:a rdfs:label "test".
>> (2) _:a rdfs:comment "...".
>> (3) _:b rdfs:label "test".
>>
>> The algorithm should tell me that I can safely drop statement 3.
>>
>> Or in a more complex example:
>>
>> (1) _:a foaf:knows _:b.
>> (2) _:b foaf:knows _:a.
>> (3) _:c foaf:knows _:d.
>> (4) _:d foaf:knows _:c.
>>
>> Tell me that 3 and 4 can be removed, but that the reduced graph
>> cannot further be reduced, even if _:a and _:b seem indistinguishable.
>>
>> Does anybody has an algorithm or even some code for doing this?
>>
>> Thanks,
>> reto
>>
>>
>>
>>
>
>
```

Received on Monday, 21 November 2005 11:10:33 UTC