Re: Deleting subgraphs via SPARQL (illegal?)

>> In my experience developing application it is often a resource  
>> (like an
>> Article) which has to be removed rather than distinct statements.  
>> What
>> should be deleted with the "article", just its properties? its  
>> CBD? the
>> CBD without statements with a subject that is the object of another
>> statement in the model? I don't think that any of these approaches  
>> is a
>> good solution in all cases.
>
> Yep, I'm now running into those kind of issues (with RSS aggregation)
> that presumably you've already been through with Knobot. It's not
> obvious what is the best solution.

In my own playing around, removing something like a CBD seemed to  
work, but I can see problems with it. E.g. descriptions of bnodes are  
returned in a CBD because they're hard to reference on their own, but  
they might be referenced by another resource, so we shouldn't  
necessarily delete them (example: a foaf:Person without a URI).

(In fact, removing an "article" might cause problems if it's  
referenced by another resource.)

We _probably_ want to get rid of bnodes if they're only referenced by  
the resource we're trying to remove. How do we tell? Pass.

It all comes down to _intent_ and reference-counting. Mental garbage  
collection!

>> Forbidding deletions is probably a too radical approach (after all, I
>> want some institutions to delete my address completely), however  
>> we can
>> try to collect triples that we will never have to delete. The key  
>> is to
>> look for "brute facts" (not for finding them, just to get closer).  
>> The
>> RDFization of  "xy will have a speech from 2005-07-02 10:00 till  
>> 12:00"
>> is much less robust than "xy has accepted the invitation to a speech
>> from 2005-07-02 10:00 till 12:00 on 2005-06-28", the second assertion
>> allows the first to be a reasonable guess till we add the triples  
>> to say
>> "xy has canceled his speech scheduled for 2005-07-02 10:00 on
>> 2005-06-29".

Or the "asserted on"/"retracted on" annotation idea could be used. In  
the first instance, in a sense xy _will_ have a speech -- it's just  
he turns out not to. The second statement is weaker, and therefore  
doesn't need retraction, but it doesn't actually state that he'll be  
there! He could have accepted without ever intending to attend.

Of course, the second statement still might actually need retraction  
-- say his secretary returned the acceptance without consultation, so  
xy never accepted. (Hmm, need digital signatures ;))

> That makes sense. I suppose the more stores there are (globally) then
> the less need there will be to keep lower-relevance statements.
> Storage is cheap, but keeping them in the foreground will presumably
> cost in access time.

It's finding them again that's the problem...

> Giovanni, I don't suppose your RDFGrowth algorithm might be tweakable
> to allow you to efficiently maintain accessible archives? So you'd
> maybe have immediate stuff  in-memory, longer term in a triplestore as
> usual, but then have linkage to a massive/very long term store in the
> background. Then don't delete, archive.

Paging out RDF onto disk, then burning it to CD-R, so to speak...

-R

Received on Wednesday, 29 June 2005 08:07:26 UTC