- From: Luc Moreau <L.Moreau@ecs.soton.ac.uk>
- Date: Wed, 25 May 2011 13:17:11 +0100
- To: Graham Klyne <GK@ninebynine.org>
- CC: public-prov-wg@w3.org
Nothing in the example is restricted to rdf or triple stores. It also applies to a table in a relational database (and its xml serialization), or an excel spreadsheet (and a csv representation). The relational database/table and the spreadsheet can be seen as containers, since they can be updated. The reason why it is important is that we need to consider stateful resources (well, I think so, don't you?). An alternative way of looking at it, adopting some old programming language terminology, is this: a resource is like a l-value a snapshot is like a r-value a r-text is like a representation of a r-value Luc On 05/25/2011 09:53 AM, Graham Klyne wrote: > I have a problem with resource-as-container. I think it's too > constraining. My zebra example wouldn't comply. > > As for the distinction between f1 and r1 per your example, I think > this is rather broadening the discussion - which I'm not sure is > necessary or helpful. > > I would say that in this case, r1 is a service resource. And as such, > I don't think it makes sense to download a service. E.g. what to you > receive if you do s simple HTTP GET in a SPARQL endpoint URI? I think > it's typically some kind of intro page that explains how to use the > service (e.g. http://data.clarosnet.org/sparql/). The URIs that may > be used to download *content* from the triple store are different > (e.g. URI-encoded SPARQL queries, or constructed LDAPI URIs). > > So, for the purposes of this example, we need to be clearer about what > we mean when saying "analyst (alice) downloads a turtle serialization > (lcp1) of the resource (r1) from government portal" - in this context, > I don't think it makes sense as it stands. > > I also note that once you introduce a triple store into the mix, while > we can expect it to contain information that has been loaded into it, > when retrieving information, we have no a priori way to claim that the > information subsequently retrieved has to do with the original > resource. The best we can say is that if the entire *content* of the > resource "r1" is downloaded, then that content should contain as a > subset the RDF that was loaded. But even this isn't clear-cut - if > the triple store supports named graphs (which most do), then there's > no way to represent its entire content in a single Turtle download. > > In summary, I think the introduction of containers and triple stores > is mixing mechanism with essential provenance concepts here, and I > think we need to get the former straight before we can explain what > happens when more complex mechanisms are introduced. The scenario as > described could playperfectly well without mention of a triple store. > > #g > -- > > > Luc Moreau wrote: >> Hi Paul, >> Yesterday, I also began drafting some definition. We need >> representations in here too. I am not sure about >> your illustrations. Here is my take on it: >> >> >> >> >> From a provenance viewpoint, we seem to discuss several concepts >> related to resources. Some terminology is required to disambiguate >> concepts. It is inspired by terminology developed by the rdf working >> group (thanks to Sandro for drafting the original email!) >> >> >> 1. A "resource" is a container, whose contents may vary over time. >> Its content may be structured in many different ways (hierarchical >> XML tree, RDF arcs, etc). >> >> 2. A "r-snapshot" is a state of a resource, or a snapshot of that >> resource at a specific instant. A r-snapshot is immutable. From a >> resource that changes over time, one can obtain multiple >> r-snapshots. >> >> 3. A "r-text" is a particular sequence of characters or bytes which >> conveys a particular r-snapshot in some language. If you can parse >> a r-text, you know what is in the r-snapshot it conveys. You can >> tell someone exactly what is in a particular resource at some >> instant by sending them a r-text. (You send them the r-text which >> conveys the r-snapshot which is the current state of that resource.) >> >> >> >> In some cases, some resources do not vary over time, which means that >> there is a single r-snapshot for them, and some may even have a >> single r-text >> (no content negotiation). In such a specific case (static resources >> on the web), >> the three concepts conflate into a single one. >> >> The challenge is to deal with dynamic contents. >> >> >> >> Illustration inspired by the example. >> >> - government (gov) converts data (d1) to RDF file (f1) at time (t1) >> using xlst transform >> - government (gov) uploads RDF data (f1) into a triple store, exposed >> as Web resource (r1) >> - analyst (alice) downloads a turtle serialization (lcp1) of the >> resource (r1) from government portal >> >> Illustrations: >> - r1: is a resource: it's the triple store, its a container, its >> content can vary over time >> - lcp1: is a r-text (turtle serialization) of a given snapshot >> (created by, or available at the time of, download) >> - f1 is a local file: it can be seen as a stateless anonymous >> resource, with a single r-text. >> >> If in addition: >> - analyst (alice) downloads a rdf/xml serialization (lcp2) of the >> resource (r1) >> >> If the content of r1 has not changed, then lcp2 and lcp1 are both >> r-texts of a same r-snapshot. >> >> Note that this is not limited to RDF (as Graham mentioned) >> >> - newspaper (news), uses a CMS to publish the incidence map (map1), >> chart (c1) and >> the image (img1) within a document (art1) written by (joe) using >> license (li2) >> - newspaper (news), updates art1, adding a correction following a >> complaint from a reader >> >> Illustrations: >> - art1 is a also resource, with two r-snapshots (before and after >> correction) >> - with language negotiation, an http client can download html and >> xhtml representations (i.e., r-texts) of the article >> >> >> >> What do you think? >> Cheers, >> Luc >> >> >> On 05/25/2011 06:49 AM, Paul Groth wrote: >>> Hi, >>> >>> To throw out some, perhaps simpler, definitions into the mix that I >>> think follow along the lines of what's being discussed. >>> >>> Resource - something that can be identified >>> >>> Snapshot - the state of a resource at particular point in time >>> >>> In the Data Journalism Scenario: a 'resource' would be the web page. >>> a 'snapshot' would be the web page before publication. >>> >>> cheers, >>> Paul >>> >>> Note: Similar concepts are found within many provenance models that >>> I know of....if it's helpful I can list those out >>> >> > -- Professor Luc Moreau Electronics and Computer Science tel: +44 23 8059 4487 University of Southampton fax: +44 23 8059 2865 Southampton SO17 1BJ email: l.moreau@ecs.soton.ac.uk United Kingdom http://www.ecs.soton.ac.uk/~lavm
Received on Wednesday, 25 May 2011 12:17:57 UTC