Re: RDF-Entailment: Remove duplicate anonymous resources - looking for an algorithm from Danny Ayers on 2005-11-21 (semantic-web@w3.org from November 2005)

From: Danny Ayers <danny.ayers@gmail.com>
Date: Mon, 21 Nov 2005 11:17:27 +0100
To: Reto Bachmann-Gmür <reto@gmuer.ch>
Cc: semantic-web at W3C <semantic-web@w3c.org>
Message-ID: <1f2ed5cd0511210217v5c88054du5282d002e8c3ed81@mail.gmail.com>

Hi reto,

On 11/21/05, Reto Bachmann-Gmür <reto@gmuer.ch> wrote:
>
> Hello
>
> I'm looking for an algorithm to remove duplicate anonymous resources
> from an RDF-Model.
>
> Fo example, given the graph:
>
> (1) _:a rdfs:label "test".
> (2) _:a rdfs:comment "...".
> (3) _:b rdfs:label "test".
>
> The algorithm should tell me that I can safely drop statement 3.

Ok, so presumably you can't know that _:a and _:b are the same node
without looking at other parts of the graph...

The nearest I've come to this is smushing on IFPs, recently with
Morten's Redland smusher [1] - there may be something useful around
his notes/code.

At one point I realised I'd got a load of duplicate bnodes in my
graph, and that for the data set I had (derived from a bunch of feeds)
there were literal properties (rss:title I think) I could temporarily
mark as IFPs to clean up. Unfortunately I'd overlooked the fact that a
lot of the literals had empty string values (_:a rss:title "") which
made a total dog's dinner of the data.

Cheers,
Danny.

[1] http://www.wasab.dk/morten/blog/archives/2005/03/20/redland-smushing

--

http://dannyayers.com

Received on Monday, 21 November 2005 10:17:30 UTC