- From: Graham Klyne <GK@ninebynine.org>
- Date: Wed, 25 May 2011 09:53:32 +0100
- To: Luc Moreau <L.Moreau@ecs.soton.ac.uk>
- CC: public-prov-wg@w3.org
I have a problem with resource-as-container. I think it's too constraining. My zebra example wouldn't comply. As for the distinction between f1 and r1 per your example, I think this is rather broadening the discussion - which I'm not sure is necessary or helpful. I would say that in this case, r1 is a service resource. And as such, I don't think it makes sense to download a service. E.g. what to you receive if you do s simple HTTP GET in a SPARQL endpoint URI? I think it's typically some kind of intro page that explains how to use the service (e.g. http://data.clarosnet.org/sparql/). The URIs that may be used to download *content* from the triple store are different (e.g. URI-encoded SPARQL queries, or constructed LDAPI URIs). So, for the purposes of this example, we need to be clearer about what we mean when saying "analyst (alice) downloads a turtle serialization (lcp1) of the resource (r1) from government portal" - in this context, I don't think it makes sense as it stands. I also note that once you introduce a triple store into the mix, while we can expect it to contain information that has been loaded into it, when retrieving information, we have no a priori way to claim that the information subsequently retrieved has to do with the original resource. The best we can say is that if the entire *content* of the resource "r1" is downloaded, then that content should contain as a subset the RDF that was loaded. But even this isn't clear-cut - if the triple store supports named graphs (which most do), then there's no way to represent its entire content in a single Turtle download. In summary, I think the introduction of containers and triple stores is mixing mechanism with essential provenance concepts here, and I think we need to get the former straight before we can explain what happens when more complex mechanisms are introduced. The scenario as described could playperfectly well without mention of a triple store. #g -- Luc Moreau wrote: > Hi Paul, > Yesterday, I also began drafting some definition. We need > representations in here too. I am not sure about > your illustrations. Here is my take on it: > > > > > From a provenance viewpoint, we seem to discuss several concepts > related to resources. Some terminology is required to disambiguate > concepts. It is inspired by terminology developed by the rdf working > group (thanks to Sandro for drafting the original email!) > > > 1. A "resource" is a container, whose contents may vary over time. > Its content may be structured in many different ways (hierarchical > XML tree, RDF arcs, etc). > > 2. A "r-snapshot" is a state of a resource, or a snapshot of that > resource at a specific instant. A r-snapshot is immutable. From a > resource that changes over time, one can obtain multiple > r-snapshots. > > 3. A "r-text" is a particular sequence of characters or bytes which > conveys a particular r-snapshot in some language. If you can parse > a r-text, you know what is in the r-snapshot it conveys. You can > tell someone exactly what is in a particular resource at some > instant by sending them a r-text. (You send them the r-text which > conveys the r-snapshot which is the current state of that resource.) > > > > In some cases, some resources do not vary over time, which means that > there is a single r-snapshot for them, and some may even have a single > r-text > (no content negotiation). In such a specific case (static resources on > the web), > the three concepts conflate into a single one. > > The challenge is to deal with dynamic contents. > > > > Illustration inspired by the example. > > - government (gov) converts data (d1) to RDF file (f1) at time (t1) > using xlst transform > - government (gov) uploads RDF data (f1) into a triple store, exposed > as Web resource (r1) > - analyst (alice) downloads a turtle serialization (lcp1) of the > resource (r1) from government portal > > Illustrations: > - r1: is a resource: it's the triple store, its a container, its content > can vary over time > - lcp1: is a r-text (turtle serialization) of a given snapshot (created > by, or available at the time of, download) > - f1 is a local file: it can be seen as a stateless anonymous resource, > with a single r-text. > > If in addition: > - analyst (alice) downloads a rdf/xml serialization (lcp2) of the > resource (r1) > > If the content of r1 has not changed, then lcp2 and lcp1 are both > r-texts of a same r-snapshot. > > Note that this is not limited to RDF (as Graham mentioned) > > - newspaper (news), uses a CMS to publish the incidence map (map1), > chart (c1) and > the image (img1) within a document (art1) written by (joe) using > license (li2) > - newspaper (news), updates art1, adding a correction following a > complaint from a reader > > Illustrations: > - art1 is a also resource, with two r-snapshots (before and after > correction) > - with language negotiation, an http client can download html and xhtml > representations (i.e., r-texts) of the article > > > > What do you think? > Cheers, > Luc > > > On 05/25/2011 06:49 AM, Paul Groth wrote: >> Hi, >> >> To throw out some, perhaps simpler, definitions into the mix that I >> think follow along the lines of what's being discussed. >> >> Resource - something that can be identified >> >> Snapshot - the state of a resource at particular point in time >> >> In the Data Journalism Scenario: a 'resource' would be the web page. a >> 'snapshot' would be the web page before publication. >> >> cheers, >> Paul >> >> Note: Similar concepts are found within many provenance models that I >> know of....if it's helpful I can list those out >> >
Received on Wednesday, 25 May 2011 10:54:49 UTC