Re: RDF triple assertions live forever?

Phil Archer writes:
> Renato Golin wrote:
> > Phillip Rhodes wrote:
> >> In a discussion that has arisen recently on the foaf-dev list, somebody
> >> pointed out that they've been told that RDF triples live forever.  
> >> That is, once something is asserted it is considered asserted until, 
> >> as it
> >> was put, "the entropic heat death of the universe."

I think you'll find that view is not widely held, except as explained in
#2 below.   In fact, outside that context, I can't think of why someone
would claim that.

> > Also, with RDF is easier to say that site A has "the same triple as" 
> > another site B but with different content, who will you trust? Let's say 
> > you have a timestamp annotating the triples, would you still believe the 
> > "newest" one?
> > 
> > Site A:
> >   renato is bad (today)
> > 
> > Site B:
> >   renato is good (10 years ago)
> Being the one who kicked this off by making the original assertion 
> (which I actually got from someone else but almost certainly 
> mis-interpreted along the way) I feel I should give a little further input.
> 
> Actually, it's _good news_ (as well as common sense) that triples don't 
> get stored in perpetuity. I came to this from the standpoint of wanting 
> to make the statement (in a semantic way) that
> 
> foaf:Agent "will stand by the following assertions until" $date
> 
> Which is a little different from a cache header...

There are a couple of different issues here.   For the first issue, yes,
of course RDF triples on the web can and should change.  For the second,
"only sort of".

   1.  There is an imperfect connection between agents (people,
       organizations, psuedo-human software systems) which have
       knowledge they intend to publish and the data their readers
       actually receive.  This is a common-sense problem in human
       experience: if you receive a written message, do you know if the
       author still holds what they said when they wrote the message?
       We're used to this in books, letters, and web pages, and we have
       a whole bunch of ways to try to decide whether the author would
       have updated the message if the situation had changed.  Most
       obviously, if the page contains some statements we know to be
       outdated, our confidence in the rest of the text goes down.  If
       we have a history of seeing the page updated regularly, our
       confidence goes up.  This speaks to a kind of trust in the agent,
       but it's quite different from whether the agent would have lied.

       I think:

       > foaf:Agent "will stand by the following assertions until" $date

       is a reasonable approach here.  It may also be good to have some
       way for the reader to notice/diagnose possible failure modes.
       For instance, having periodic updates to the data (every minute,
       every week, every year, depending on the application) gives the
       reader some indication of how much attention is being paid to
       maintaining the resource.  Every-second trivial updates
       (timestamps) let people know the computer is still working;
       Every-day/week human-generated updates (news entries) let people
       know that a human still cares (about something, at least).  If
       the content their speaks to the subject we care about, then our
       confidence goes way up.

       It's also important to understand how negative statements
       (perhaps via the closed-world assumption) fit in. Retracting a
       claim S is not the same as claiming not-S, unless a CWA is in
       effect.  See The Frame Problem [1].

   2.  It's nice to have immutable snapshots and transaction logs.  For
       some kinds of decentralized processing, it may be essential.

       I wrote a web harvester once that gathered up triples from around
       the web, downloading and storing RDF triples from RDF/XML sources
       and downloading some metadata from HTML sources.  It stored the
       harvested triples along with information about how it was
       retrieved -- what URL was used, what the time was, etc -- in
       immutable Retrieval Log records.  Such records could be added to
       the database or removed to save space, but would never be
       changed.  It was a journal entry -- this is what the harvested
       did at some point in time.  The purpose here was to allow people
       to browse the semantic web as it was at some point in the past,
       like archive.org does for the web in general.  One interesting
       aspect of the approach was that I republished the snapshots of
       other people's RDF, along with the harvester metadata.  An
       archiving harvester repeatedly checking *those* pages would be
       wasting its time (unless something went wrong).

       I imagined that it might be very nice if people would publish
       snapshots of their own data as it changed, rather than me having
       to do it for them.  Of course instead of snapshots, it might be
       nice to have just a description of the changes since the previous
       version.   (I don't remember if I implemented that or not.)

       So, there is (obviously?) a kind of duality between the current
       state of some resource and the sequence of changes that have been
       made to it since some null starting state.  I have written some
       about this [2], as has TimBL [3], and this feels important to the
       Semantic Web, but I haven't seen it really manifest yet.  I'm
       sure there are some application where it will be important to
       keep history for at least some time, and/or where distributing
       diffs will be important.  But now I'm rambling.

In short, triples live forever in the same sense these words I'm writing
(which will be archived in various places) live forever.

    -- Sandro


[1] http://en.wikipedia.org/wiki/Frame_problem#The_answer_set_programming_solution
[2] http://esw.w3.org/topic/DeltaView
[3] http://www.w3.org/DesignIssues/Diff

Received on Friday, 28 March 2008 13:28:21 UTC