Deleting subgraphs via SPARQL

[I was going to post this to my blog, so apologies for the markup and
thinking out loud, but I would quite like to hear about solutions, and
there are probably more available over here ;-)]

IANAL, but I think formally speaking, the notion of deleting
statements is outside of the open world model. An statement is for
ever, not just for assertionmas. But in practice it's not an uncommon
thing to want to do, dropping a particular bunch of statements from a
graph (hmm, actually it's just looking at a different graph, ah
whatever).

Doing this programmatically against an RDF API is relatively
straightforward - locate/identify the triples in question (which
generally means looking at them as variables/objects), then delete
them using something like <code>del model[statement]</code>
(Redland/Python) on each.

The tricky part of that is choosing the triples you want. Now we have
a wonderful query language, <a
href="http://www.w3.org/TR/rdf-sparql-query/">SPARQL</a>, that makes
the selection pretty easy. Do a <code>SELECT</code> with the pattern
of interest and you have the subgraphs. I may be missing something,
but there appears to be a snag when it comes to deletion. SPARQL
doesn't have anything like a <code>DELETE</code> operation (or for
that matter <code>INSERT</code>, but that's not difficult
programmatically, perhaps with the aid of <code>CONSTRUCT</code>).  So
say you've got a rogue triple:
<code>_:xxx foaf:name "Fred"</code> 
You could grab that triple (and any like it) using a
<code>SELECT</code> . Problem is, in the graph you get that might
appear as:
<code>_:r1119383015r648  foaf:name "Fred"</code> 
The bnode identifier doesn't carry, it's arbitrary, it only has to be
consistent throughout the local representation of the graph. This node
isn't anchored anywhere by a URI. I suspect (sub)graphs built from
bnodes and literals alone are probably best avoided, but recent
experience says they're very easy to create by mistake.

There's a fair chance the bnode identifiers are kept the same
throughout a particular triplestore implementation, but that seems a
very risky, non-portable solution. Which, as far as I can hazard,
leaves two options. The first is graph matching. If you do have
URI-named resources then this could be fairly easy, especially if some
of the properties are (inverse) functional. But if not, a combinations
kind of check would be needed to make sure you're looking at an
equivalent subgraph with the model as the one you want to delete.
Aargh, there could be problems with what's connected to the subgraph
if it appears more than once. But presumably you'd only be attempting
this if you knew exactly what you were trying to achieve... Moving
swiftly on, I think another option would be to use reification. Use
<code>CONSTRUCT</code> and in it reify the statements you want to
delete, give them all a property of <code>deleteMe</code> or whatever.
Might work.

Cheers,
Danny (wearing chirpy blogger hat).

-- 

http://dannyayers.com

Received on Wednesday, 22 June 2005 18:39:26 UTC