Re: RDF-Entailment: Remove duplicate anonymous resources - looking for an algorithm

I've implemented the algorithm I described yesterday for jena (with 
litle fixes).

The source is here: http://gmuer.ch/2005/11/24/leanifier
A jar containing source, classes and junit-test ist here: 
http://gmuer.ch/2005/11/24/rdf-leanifier

I'm happy to add new tests while I hope someone finds a way to make 
things faster.

Just to get an idea on how long it takes:

It took 27ms to find out that a 2 statements model is lean
It took 22831ms to find out that a 253 statements model is lean
It took 5ms to reduce a model from 4 to 2 statements
It took 1ms to reduce a model from 4 to 2 statements

Guess, how long will it take to leanify a model with 150278 statements 
(that's the size of the model of the website at http://www.osar.ch/)?

reto


Reto Bachmann-Gmür schrieb:

> Just saw the first (fixable) bug, see below
>
> I wrote:
>
>> ....
>>
>>boolean implies(Node n1, Node n2, List history1, List history2, Set knownNotImplying, Set conditionalImplications) {
>>.....
>> for every other statement (n2, p, o2) or (o2, p, n2) in p2 check if there is a statement (n1, p, o1) respectively (o1, p, n1) in p1 with the same predicate and for which implies(o1, o2, history1.clone(),history2.clone(), knownNotImplying, conditionalImplications) is true, otherwise add {n1, n2} to knownNotImpying an return false;
>> conditionalImplications.add({n1, n2}) and return true;
>>  
>>
> The recursive call to implies(...) may add elements to 
> conditionalImplications which should be removed when the mehod return 
> false, so before every "return false" the conditionalImplications-set 
> should be reset to the same values as when the method was invoked.
>
> probably not the last bug....
>
>
> reto

Received on Thursday, 24 November 2005 07:17:30 UTC