- From: Sandro Hawke <sandro@w3.org>
- Date: Tue, 11 Oct 2011 12:11:36 -0400
- To: public-rdf-wg <public-rdf-wg@w3.org>
[ I wrote this this morning, before Andy sent "Time-varying g-boxes"; I'm going to leave it in this form, instead of re-writing it as a response to his. They are related but different ideas. I actually avoided talking about dataset in this, sticking just to Web resources.] We've talked several times about fixed or static g-boxes, how one might implement them, and how they they might be used. Here's one straw proposal. It's possible the vocabulary here could/should be standardized, but I don't think it's in scope for a Recommendation from this WG, at least. We could put it in a Note, or suggest someone else do it. I'm writing this now as a proof of concept, to show that it's not unreasonable to have fixed g-boxes. 1. There is a non-fixed g-box, a normal g-box, whose contents change over time in discrete steps. It is called the "latest version" g-box and its URL is "latest version" URL. [[Alternatively, something I hadn't thought of until Andy's email: it could change continuously and be treated as if changed only when sampled.]] For example: http://alice.example.org/foaf might be where Alice keeps the latest version of her FOAF file, providing some information about herself and her social network. 2. For each latest version g-box, there is a set of fixed g-boxes, whose contents never change, called the "snapshot" g-boxes. Each has a URL, a "snapshot" URL. There is one snapshot g-box and one snapshot URL per period of time between changes of the latest version g-box. For example: http://alice.example.org/foaf.history/7 might be the URL of the seventh snapshot, the seventh version of her foaf file. Alternatively, for ease of administration, instead of numbering the versions, she might use the modifiation time: http://alice.example.org/foaf.history/2011-10-11T06:28:11.103 or a hash of the contents (which could get more complicated if she's doing content-negotiation): http://alice.example.org/foaf.history/272fbd2b67b0bb9c5135d71d1d1a848b 3. There is a subset of the snaphot g-boxes, the "retained g-boxes", which the system hosting the latest version g-box makes available. An attempt to dereference a non-retained g-box URL should usually result a "410 Gone" HTTP response. For example, Alice might retain up to 100 snapshots, going back up to 10 days. Of course, others might keep copies as well, but those are not "retailed g-boxes" in this sense, because they will not be properly linked, as below. 4. Each snapshot contains metadata which may enable a consumer to find the latest version and other snapshots: a link to the latest version, to the snapshot itself, and, if there is a previous version, to the previous version. This pattern of linking is also used by W3C Technical Reports. The pattern looks like: <> snap:latestVersion <...latest version URL...>; snap:thisVersion <...this snapshot URL...>; snap:previousVersion <...snapshot URL of previous version...>. One exception is the first version, for which the pattern looks like this: <> snap:latestVersion <...latest version URL...>; snap:thisVersion <...this snapshot URL...>; rdf:type snap:FirstVersion. snap:thisVersion as a range of snap:FixedResource, one MAY include the type triple that implies. 5. Optionally, this metadata may instead be provided in RFC 5988 Link headers. This allows the snapshots to not be touched by metadata. (Issue. This makes life harder for the clients, since they need to look in two places. But some people want to be able to publish graphs unmodified. And perhaps this can/should be spec'd for all resources, not just ones which can carry RDF metadata.) 6. Optionally, snap:nextVersion arcs may be present. If the metadata is embedded (instead of being provided with Link headers), this is only possible if the next version URL is known when the current version becomes fixed. In the general case, this is possible if the versions are simply numbered ("7", "8", ...) or if times are used and versions change at known intervals. 7. The time at which a given snapshot became the latest snapshot SHOULD be proved as the HTTP Last-Modified time at both the latest version URL and the snapshot URL. Optionally, it may be included inside the data: <> snap:versionDate "...."^^xs:datetime. The two values SHOULD be the same, but in practice on some Web servers this may not be practical. (spell out procedure for if they disagree?) 8. Optionally, an index page (or service) may be available, on which all the above metadata is available. If so, it SHOULD be linked from each snapshot (or at least the snapshots since the index became available) as: <> snap:allVersions <...URL of version index page...>. I think that's it. Obviously, one could provide a SPARQL front end in front of any/all of these URLs if desired. -- Sandro
Received on Tuesday, 11 October 2011 16:11:48 UTC