- From: Danny Ayers <danny.ayers@gmail.com>
- Date: Mon, 21 Nov 2005 11:17:27 +0100
- To: Reto Bachmann-Gmür <reto@gmuer.ch>
- Cc: semantic-web at W3C <semantic-web@w3c.org>
Hi reto, On 11/21/05, Reto Bachmann-Gmür <reto@gmuer.ch> wrote: > > Hello > > I'm looking for an algorithm to remove duplicate anonymous resources > from an RDF-Model. > > Fo example, given the graph: > > (1) _:a rdfs:label "test". > (2) _:a rdfs:comment "...". > (3) _:b rdfs:label "test". > > The algorithm should tell me that I can safely drop statement 3. Ok, so presumably you can't know that _:a and _:b are the same node without looking at other parts of the graph... The nearest I've come to this is smushing on IFPs, recently with Morten's Redland smusher [1] - there may be something useful around his notes/code. At one point I realised I'd got a load of duplicate bnodes in my graph, and that for the data set I had (derived from a bunch of feeds) there were literal properties (rss:title I think) I could temporarily mark as IFPs to clean up. Unfortunately I'd overlooked the fact that a lot of the literals had empty string values (_:a rss:title "") which made a total dog's dinner of the data. Cheers, Danny. [1] http://www.wasab.dk/morten/blog/archives/2005/03/20/redland-smushing -- http://dannyayers.com
Received on Monday, 21 November 2005 10:17:30 UTC