Re: RDF-Entailment: Remove duplicate anonymous resources - looking for an algorithm from Andreas Andreakis on 2005-11-21 (semantic-web@w3.org from November 2005)

From: Andreas Andreakis <andreas.andreakis@gmx.de>
Date: Mon, 21 Nov 2005 11:33:48 +0100
To: Reto Bachmann-Gmür <reto@gmuer.ch>
CC: semantic-web at W3C <semantic-web@w3c.org>
Message-ID: <4381A28C.40708@gmx.de>

Hi Reto,


may I ask, why you are facing this particular problem ? Maybe many 
people hava access to a shared RDF Ontology ?
I´m asking because your post is an example for the running thread: "Why 
Literals should be unique and why this is a serious issue".


In an algorithm for your problem you must annotate Properties, which 
should identify resources and the algorithm must only consider those. 
This is major, because sometimes people describe also properties, which 
can occur with the same value on multiple resources. In your first 
example for instance, it could be that two indenpendent resources have 
the same label "test", this is a common case.

Be also aware of your second example. The case you describe is 
error-prone, since you can´t know if a,b = c,d , maybe "a", "b", "c" and 
"d" are totally different people in your FOAF Ontology.


A solution to your problem and without the need to write an algorithm, 
is to use OWL Full with InverserFuncProperties, see: 
http://www.w3.org/TR/2004/REC-owl-guide-20040210/#InverseFunctionalProperty 
. You can define properties as Resource-Identifing and thus multiple 
Instances having the same value over an InvFuncProperty will not be 
possible.

The drawbacks of this suggestion:
1) refactoring of your ontology
2) you also need an Implementation, which haldels 
InverseFunctionalProperties. If you are using Java the Jena-API is solution
3) Inferencing performance will be poor if you dont choose a serious OWL 
Reasoner, take: Pellet if you use Jena
4) Performance will be unnecessary slow if you dont deactivate OWL 
Semantics you dont need (you should only use InverseFunctProperties, 
look in the jena-docs or ask in the jena-mailinglist for this)


best regards,
Andreas


Reto Bachmann-Gmür schrieb:

>
> Hello
>
> I'm looking for an algorithm to remove duplicate anonymous resources 
> from an RDF-Model.
>
> Fo example, given the graph:
>
> (1) _:a rdfs:label "test".
> (2) _:a rdfs:comment "...".
> (3) _:b rdfs:label "test".
>
> The algorithm should tell me that I can safely drop statement 3.
>
> Or in a more complex example:
>
> (1) _:a foaf:knows _:b.
> (2) _:b foaf:knows _:a.
> (3) _:c foaf:knows _:d.
> (4) _:d foaf:knows _:c.
>
> Tell me that 3 and 4 can be removed, but that the reduced graph cannot 
> further be reduced, even if _:a and _:b seem indistinguishable.
>
> Does anybody has an algorithm or even some code for doing this?
>
> Thanks,
> reto
>
>
>
>

Received on Monday, 21 November 2005 10:35:10 UTC