Re[2]: Deleting subgraphs via SPARQL (illegal?) from David Powell on 2005-06-29 (semantic-web@w3.org from June 2005)

From: David Powell <djpowell@djpowell.net>
Date: Wed, 29 Jun 2005 20:42:34 +0100
To: Reto Bachmann-Gmür <reto@gmuer.ch>
CC: Giovanni Tummarello <giovanni@wup.it>, semantic-web@w3.org
Message-ID: <1857442403.20050629204234@djpowell.net>

Tuesday, June 28, 2005, 8:39:21 PM, Reto Bachmann-Gmür wrote:

>> One question i do have is.. is it really illegal to delete triples in
>> rdf? should'd a RDF model reflect a real world interpretation? if the
>> world changes, should the model should change as well? I believe i was
>> talking to Seaborne about this. I am puzzled :-) i too gave the 
>> assumption that monotonicity meant shouldt delete stuff but they seemed
>> to disagree based on this issue. Clarifications anyone? :-)

> [...]
> Forbidding deletions is probably a too radical approach (after all, I
> want some institutions to delete my address completely), however we can
> try to collect triples that we will never have to delete. The key is to
> look for "brute facts" (not for finding them, just to get closer). The
> RDFization of  "xy will have a speech from 2005-07-02 10:00 till 12:00"
> is much less robust than "xy has accepted the invitation to a speech
> from 2005-07-02 10:00 till 12:00 on 2005-06-28", the second assertion
> allows the first to be a reasonable guess till we add the triples to say
> "xy has canceled his speech scheduled for 2005-07-02 10:00 on
> 2005-06-29".

The difficulties in modelling temporally changing data aren't specific
to RDF. It is worth looking at some of the work that has been done to
represent temporal data in RDBMs. I found this thesis [1] very useful.

I did some experiments in modelling bitemporal data in RDF a while
ago, and produced a kind of blogging server that was backed by an RDF
store. I designed the vocabulary so that it never required any
statements to be deleted, but was capable of representing the "valid
time" of entities (ie, the xy will have a speech from 2005-07-02 10:00
till 12:00), and the "transaction time" of records, ie (this
information was recorded/deleted on 2005-06-29 20:00).

So not only could you implement delayed publishing and expiry by
setting the validStart and validEnd of an article instance, you could
also make corrections to the data and those dates as often as you
liked afterwards, and have an audit trail of every change made to the
data.

I don't have a demo that I can show easily, as it is still a work in
progress, but here are some links to the experimental schemas [2] [3].

Basically, it works by having a time:TemporalData resource which can
be used to represent invariant properties of a thing, and a number of
time:TemporalSnapshot resources, which represent the varying data.
These are related via time:snapshotOf properties, and the time ranges
of the snapshots are represented by time:validStart, time:validEnd,
time:transactionStart, and time:transactionEnd.

So the TemporalSnapshot has a collection of properties including the
date ranges, the reference to the invariant TemporalData resource, and
the actual properties that you want to represent, eg: dc:title,
rss:content etc...

[1] http://www.cs.auc.dk/~csj/Thesis/
[2] http://djpowell.net/tmp/temporal-rdfs.xml
[3] http://djpowell.net/tmp/pyblog-rdfs.xml

-- 
Dave

Received on Wednesday, 29 June 2005 19:43:09 UTC