The "Rolling Snapshots" Pattern and Vocabulary

[ I wrote this this morning, before Andy sent "Time-varying g-boxes";
I'm going to leave it in this form, instead of re-writing it as a
response to his. They are related but different ideas.  I actually
avoided talking about dataset in this, sticking just to Web resources.]

We've talked several times about fixed or static g-boxes, how one
might implement them, and how they they might be used.  Here's one
straw proposal.  It's possible the vocabulary here could/should be
standardized, but I don't think it's in scope for a Recommendation
from this WG, at least.  We could put it in a Note, or suggest someone
else do it.  I'm writing this now as a proof of concept, to show that
it's not unreasonable to have fixed g-boxes.

1.  There is a non-fixed g-box, a normal g-box, whose contents change
    over time in discrete steps.  It is called the "latest version"
    g-box and its URL is "latest version" URL.  [[Alternatively,
    something I hadn't thought of until Andy's email: it could change
    continuously and be treated as if changed only when sampled.]]

    For example:

        http://alice.example.org/foaf

    might be where Alice keeps the latest version of her FOAF file,
    providing some information about herself and her social network.

2.  For each latest version g-box, there is a set of fixed g-boxes,
    whose contents never change, called the "snapshot" g-boxes.  Each
    has a URL, a "snapshot" URL.  There is one snapshot g-box and
    one snapshot URL per period of time between changes of the latest
    version g-box.

    For example:

        http://alice.example.org/foaf.history/7

    might be the URL of the seventh snapshot, the seventh version of
    her foaf file.

    Alternatively, for ease of administration, instead of numbering
    the versions, she might use the modifiation time:

        http://alice.example.org/foaf.history/2011-10-11T06:28:11.103

    or a hash of the contents (which could get more complicated if
    she's doing content-negotiation):

        http://alice.example.org/foaf.history/272fbd2b67b0bb9c5135d71d1d1a848b

3.  There is a subset of the snaphot g-boxes, the "retained g-boxes",
    which the system hosting the latest version g-box makes available.  An
    attempt to dereference a non-retained g-box URL should usually
    result a "410 Gone" HTTP response.

    For example, Alice might retain up to 100 snapshots, going back up
    to 10 days.   

    Of course, others might keep copies as well, but those are not
    "retailed g-boxes" in this sense, because they will not be
    properly linked, as below.

4.  Each snapshot contains metadata which may enable a consumer to
    find the latest version and other snapshots: a link to the latest
    version, to the snapshot itself, and, if there is a previous
    version, to the previous version.  This pattern of linking is also
    used by W3C Technical Reports.

    The pattern looks like:

       <> snap:latestVersion <...latest version URL...>;
          snap:thisVersion <...this snapshot URL...>;
          snap:previousVersion <...snapshot URL of previous version...>.
 
    One exception is the first version, for which the pattern looks like this:

       <> snap:latestVersion <...latest version URL...>;
          snap:thisVersion <...this snapshot URL...>;
          rdf:type snap:FirstVersion.

    snap:thisVersion as a range of snap:FixedResource, one MAY include
    the type triple that implies.

5.  Optionally, this metadata may instead be provided in RFC 5988 Link
    headers.  This allows the snapshots to not be touched by metadata.
    (Issue.  This makes life harder for the clients, since they need
    to look in two places.  But some people want to be able to publish
    graphs unmodified.  And perhaps this can/should be spec'd for all
    resources, not just ones which can carry RDF metadata.)

6.  Optionally, snap:nextVersion arcs may be present.  If the metadata
    is embedded (instead of being provided with Link headers), this is
    only possible if the next version URL is known when the current
    version becomes fixed.  In the general case, this is possible if
    the versions are simply numbered ("7", "8", ...) or if times are
    used and versions change at known intervals.

7.  The time at which a given snapshot became the latest snapshot
    SHOULD be proved as the HTTP Last-Modified time at both the latest
    version URL and the snapshot URL.  Optionally, it may be included
    inside the data:

        <> snap:versionDate "...."^^xs:datetime.

    The two values SHOULD be the same, but in practice on some Web
    servers this may not be practical.  (spell out procedure for if
    they disagree?)

8.  Optionally, an index page (or service) may be available, on which
    all the above metadata is available.  If so, it SHOULD be linked
    from each snapshot (or at least the snapshots since the index
    became available) as:

        <> snap:allVersions <...URL of version index page...>.

I think that's it.    Obviously, one could provide a SPARQL front end
in front of any/all of these URLs if desired.

    -- Sandro

Received on Tuesday, 11 October 2011 16:11:48 UTC