- From: Sandro Hawke <sandro@w3.org>
- Date: Fri, 28 Mar 2008 09:27:25 -0400
- To: Phil Archer <parcher@icra.org>
- Cc: Renato Golin <renato@ebi.ac.uk>, Phillip Rhodes <mindcrime@cpphacker.co.uk>, semantic-web@w3.org, foaf-dev@lists.foaf-project.org
Phil Archer writes: > Renato Golin wrote: > > Phillip Rhodes wrote: > >> In a discussion that has arisen recently on the foaf-dev list, somebody > >> pointed out that they've been told that RDF triples live forever. > >> That is, once something is asserted it is considered asserted until, > >> as it > >> was put, "the entropic heat death of the universe." I think you'll find that view is not widely held, except as explained in #2 below. In fact, outside that context, I can't think of why someone would claim that. > > Also, with RDF is easier to say that site A has "the same triple as" > > another site B but with different content, who will you trust? Let's say > > you have a timestamp annotating the triples, would you still believe the > > "newest" one? > > > > Site A: > > renato is bad (today) > > > > Site B: > > renato is good (10 years ago) > Being the one who kicked this off by making the original assertion > (which I actually got from someone else but almost certainly > mis-interpreted along the way) I feel I should give a little further input. > > Actually, it's _good news_ (as well as common sense) that triples don't > get stored in perpetuity. I came to this from the standpoint of wanting > to make the statement (in a semantic way) that > > foaf:Agent "will stand by the following assertions until" $date > > Which is a little different from a cache header... There are a couple of different issues here. For the first issue, yes, of course RDF triples on the web can and should change. For the second, "only sort of". 1. There is an imperfect connection between agents (people, organizations, psuedo-human software systems) which have knowledge they intend to publish and the data their readers actually receive. This is a common-sense problem in human experience: if you receive a written message, do you know if the author still holds what they said when they wrote the message? We're used to this in books, letters, and web pages, and we have a whole bunch of ways to try to decide whether the author would have updated the message if the situation had changed. Most obviously, if the page contains some statements we know to be outdated, our confidence in the rest of the text goes down. If we have a history of seeing the page updated regularly, our confidence goes up. This speaks to a kind of trust in the agent, but it's quite different from whether the agent would have lied. I think: > foaf:Agent "will stand by the following assertions until" $date is a reasonable approach here. It may also be good to have some way for the reader to notice/diagnose possible failure modes. For instance, having periodic updates to the data (every minute, every week, every year, depending on the application) gives the reader some indication of how much attention is being paid to maintaining the resource. Every-second trivial updates (timestamps) let people know the computer is still working; Every-day/week human-generated updates (news entries) let people know that a human still cares (about something, at least). If the content their speaks to the subject we care about, then our confidence goes way up. It's also important to understand how negative statements (perhaps via the closed-world assumption) fit in. Retracting a claim S is not the same as claiming not-S, unless a CWA is in effect. See The Frame Problem [1]. 2. It's nice to have immutable snapshots and transaction logs. For some kinds of decentralized processing, it may be essential. I wrote a web harvester once that gathered up triples from around the web, downloading and storing RDF triples from RDF/XML sources and downloading some metadata from HTML sources. It stored the harvested triples along with information about how it was retrieved -- what URL was used, what the time was, etc -- in immutable Retrieval Log records. Such records could be added to the database or removed to save space, but would never be changed. It was a journal entry -- this is what the harvested did at some point in time. The purpose here was to allow people to browse the semantic web as it was at some point in the past, like archive.org does for the web in general. One interesting aspect of the approach was that I republished the snapshots of other people's RDF, along with the harvester metadata. An archiving harvester repeatedly checking *those* pages would be wasting its time (unless something went wrong). I imagined that it might be very nice if people would publish snapshots of their own data as it changed, rather than me having to do it for them. Of course instead of snapshots, it might be nice to have just a description of the changes since the previous version. (I don't remember if I implemented that or not.) So, there is (obviously?) a kind of duality between the current state of some resource and the sequence of changes that have been made to it since some null starting state. I have written some about this [2], as has TimBL [3], and this feels important to the Semantic Web, but I haven't seen it really manifest yet. I'm sure there are some application where it will be important to keep history for at least some time, and/or where distributing diffs will be important. But now I'm rambling. In short, triples live forever in the same sense these words I'm writing (which will be archived in various places) live forever. -- Sandro [1] http://en.wikipedia.org/wiki/Frame_problem#The_answer_set_programming_solution [2] http://esw.w3.org/topic/DeltaView [3] http://www.w3.org/DesignIssues/Diff
Received on Friday, 28 March 2008 13:28:21 UTC