Re: PROV-ISSUE-1 (define-resource): Definition for concept 'Resource' [Provenance Terminology]

Hi Paul,
Yesterday, I also began drafting some definition. We need 
representations in here too. I am not sure about
your illustrations.  Here is my take on it:

 From a provenance viewpoint, we seem to discuss several concepts
related to resources.  Some terminology is required to disambiguate
concepts.  It is inspired by terminology developed by the rdf working
group (thanks to Sandro for drafting the original email!)

1. A "resource" is a container, whose contents may vary over time.
    Its content may be structured in many different ways (hierarchical
    XML tree, RDF arcs, etc).

2. A "r-snapshot" is a state of a resource, or a snapshot of that
    resource at a specific instant.  A r-snapshot is immutable. From a
    resource that changes over time, one can obtain multiple

3. A "r-text" is a particular sequence of characters or bytes which
    conveys a particular r-snapshot in some language.  If you can parse
    a r-text, you know what is in the r-snapshot it conveys.  You can
    tell someone exactly what is in a particular resource at some
    instant by sending them a r-text.  (You send them the r-text which
    conveys the r-snapshot which is the current state of that resource.)

In some cases, some resources do not vary over time, which means that
there is a single r-snapshot for them, and some may even have a single 
(no content negotiation).  In such a specific case (static resources on 
the web),
the three concepts conflate into  a single one.

The challenge is to deal with dynamic contents.

Illustration inspired by the example.

- government (gov) converts data (d1) to RDF file (f1) at time (t1) 
using xlst transform
- government (gov) uploads RDF data (f1) into a triple store, exposed 
as  Web resource (r1)
- analyst (alice) downloads a turtle serialization (lcp1) of the 
resource (r1) from government portal

- r1: is a resource: it's the triple store, its a container, its content 
can vary over time
- lcp1: is a r-text (turtle serialization) of a given snapshot (created 
by, or available at the time of, download)
- f1 is a local file: it can be seen as a stateless anonymous resource, 
with a single r-text.

If in addition:
- analyst (alice) downloads a rdf/xml serialization (lcp2) of the 
resource (r1)

If the content of r1 has not changed, then lcp2 and lcp1 are both 
r-texts of a same r-snapshot.

Note that this is not limited to RDF (as Graham mentioned)

- newspaper (news), uses a CMS to publish the incidence map (map1), 
chart (c1) and
   the image (img1) within a document (art1) written by (joe) using
   license (li2)
- newspaper (news), updates art1, adding a correction following a 
complaint from a reader

- art1 is a also resource, with two r-snapshots (before and after 
- with language negotiation, an http client can download  html and xhtml 
representations (i.e., r-texts) of the article

What do you think?

On 05/25/2011 06:49 AM, Paul Groth wrote:
> Hi,
> To throw out some, perhaps simpler, definitions into the mix that I 
> think follow along the lines of what's being discussed.
> Resource - something that can be identified
> Snapshot - the state of a resource at particular point in time
> In the Data Journalism Scenario: a 'resource' would be the web page. a 
> 'snapshot' would be the web page before publication.
> cheers,
> Paul
> Note: Similar concepts are found within many provenance models that I 
> know of....if it's helpful I can list those out

Professor Luc Moreau
Electronics and Computer Science   tel:   +44 23 8059 4487
University of Southampton          fax:   +44 23 8059 2865
Southampton SO17 1BJ               email:
United Kingdom           

Received on Wednesday, 25 May 2011 07:43:20 UTC