Re: The "Rolling Snapshots" Pattern and Vocabulary

Just musing early in the morning in London:-)

- I am a bit worried that we would make it fairly difficult for a lambda user to publish his/her, say, foaf data and
update it regularly if this is somehow a required approach. For example, I doubt most of the people know how the HTTP
headers should be set for a specific resource and even if they know and their server use apache which can handle
.htaccess, they may not have the right to set it.

Of course, foaf may not be the good example, and this pattern should be reserved for really really time critical data.

- There is Sompel's memento project:

http://www.mementoweb.org/

which advocates adding time issues to the HTTP level. This is not necessarily a Semantic Web issue (only). Should we
let that go its course and try to, maybe, help them solving it?

Ivan



On Tue, October 11, 2011 4:11 pm, Sandro Hawke wrote:
> [ I wrote this this morning, before Andy sent "Time-varying g-boxes";
> I'm going to leave it in this form, instead of re-writing it as a
> response to his. They are related but different ideas.  I actually
> avoided talking about dataset in this, sticking just to Web resources.]
>
> We've talked several times about fixed or static g-boxes, how one
> might implement them, and how they they might be used.  Here's one
> straw proposal.  It's possible the vocabulary here could/should be
> standardized, but I don't think it's in scope for a Recommendation
> from this WG, at least.  We could put it in a Note, or suggest someone
> else do it.  I'm writing this now as a proof of concept, to show that
> it's not unreasonable to have fixed g-boxes.
>
> 1.  There is a non-fixed g-box, a normal g-box, whose contents change
>     over time in discrete steps.  It is called the "latest version"
>     g-box and its URL is "latest version" URL.  [[Alternatively,
>     something I hadn't thought of until Andy's email: it could change
>     continuously and be treated as if changed only when sampled.]]
>
>     For example:
>
>         http://alice.example.org/foaf
>
>     might be where Alice keeps the latest version of her FOAF file,
>     providing some information about herself and her social network.
>
> 2.  For each latest version g-box, there is a set of fixed g-boxes,
>     whose contents never change, called the "snapshot" g-boxes.  Each
>     has a URL, a "snapshot" URL.  There is one snapshot g-box and
>     one snapshot URL per period of time between changes of the latest
>     version g-box.
>
>     For example:
>
>         http://alice.example.org/foaf.history/7
>
>     might be the URL of the seventh snapshot, the seventh version of
>     her foaf file.
>
>     Alternatively, for ease of administration, instead of numbering
>     the versions, she might use the modifiation time:
>
>         http://alice.example.org/foaf.history/2011-10-11T06:28:11.103
>
>     or a hash of the contents (which could get more complicated if
>     she's doing content-negotiation):
>
>         http://alice.example.org/foaf.history/272fbd2b67b0bb9c5135d71d1d1a848b
>
> 3.  There is a subset of the snaphot g-boxes, the "retained g-boxes",
>     which the system hosting the latest version g-box makes available.  An
>     attempt to dereference a non-retained g-box URL should usually
>     result a "410 Gone" HTTP response.
>
>     For example, Alice might retain up to 100 snapshots, going back up
>     to 10 days.
>
>     Of course, others might keep copies as well, but those are not
>     "retailed g-boxes" in this sense, because they will not be
>     properly linked, as below.
>
> 4.  Each snapshot contains metadata which may enable a consumer to
>     find the latest version and other snapshots: a link to the latest
>     version, to the snapshot itself, and, if there is a previous
>     version, to the previous version.  This pattern of linking is also
>     used by W3C Technical Reports.
>
>     The pattern looks like:
>
>        <> snap:latestVersion <...latest version URL...>;
>           snap:thisVersion <...this snapshot URL...>;
>           snap:previousVersion <...snapshot URL of previous version...>.
>
>     One exception is the first version, for which the pattern looks like this:
>
>        <> snap:latestVersion <...latest version URL...>;
>           snap:thisVersion <...this snapshot URL...>;
>           rdf:type snap:FirstVersion.
>
>     snap:thisVersion as a range of snap:FixedResource, one MAY include
>     the type triple that implies.
>
> 5.  Optionally, this metadata may instead be provided in RFC 5988 Link
>     headers.  This allows the snapshots to not be touched by metadata.
>     (Issue.  This makes life harder for the clients, since they need
>     to look in two places.  But some people want to be able to publish
>     graphs unmodified.  And perhaps this can/should be spec'd for all
>     resources, not just ones which can carry RDF metadata.)
>
> 6.  Optionally, snap:nextVersion arcs may be present.  If the metadata
>     is embedded (instead of being provided with Link headers), this is
>     only possible if the next version URL is known when the current
>     version becomes fixed.  In the general case, this is possible if
>     the versions are simply numbered ("7", "8", ...) or if times are
>     used and versions change at known intervals.
>
> 7.  The time at which a given snapshot became the latest snapshot
>     SHOULD be proved as the HTTP Last-Modified time at both the latest
>     version URL and the snapshot URL.  Optionally, it may be included
>     inside the data:
>
>         <> snap:versionDate "...."^^xs:datetime.
>
>     The two values SHOULD be the same, but in practice on some Web
>     servers this may not be practical.  (spell out procedure for if
>     they disagree?)
>
> 8.  Optionally, an index page (or service) may be available, on which
>     all the above metadata is available.  If so, it SHOULD be linked
>     from each snapshot (or at least the snapshots since the index
>     became available) as:
>
>         <> snap:allVersions <...URL of version index page...>.
>
> I think that's it.    Obviously, one could provide a SPARQL front end
> in front of any/all of these URLs if desired.
>
>     -- Sandro
>
>
>
>
>


-- 
Ivan Herman, W3C Semantic Web Activity Lead
URL: http://www.w3.org/People/Ivan/
FOAF: http://www.ivan-herman.net/foaf.rdf

Received on Wednesday, 12 October 2011 06:16:59 UTC