- From: Luc Moreau <L.Moreau@ecs.soton.ac.uk>
- Date: Wed, 25 May 2011 08:42:43 +0100
- To: public-prov-wg@w3.org
Hi Paul,
Yesterday, I also began drafting some definition. We need
representations in here too. I am not sure about
your illustrations. Here is my take on it:
From a provenance viewpoint, we seem to discuss several concepts
related to resources. Some terminology is required to disambiguate
concepts. It is inspired by terminology developed by the rdf working
group (thanks to Sandro for drafting the original email!)
1. A "resource" is a container, whose contents may vary over time.
Its content may be structured in many different ways (hierarchical
XML tree, RDF arcs, etc).
2. A "r-snapshot" is a state of a resource, or a snapshot of that
resource at a specific instant. A r-snapshot is immutable. From a
resource that changes over time, one can obtain multiple
r-snapshots.
3. A "r-text" is a particular sequence of characters or bytes which
conveys a particular r-snapshot in some language. If you can parse
a r-text, you know what is in the r-snapshot it conveys. You can
tell someone exactly what is in a particular resource at some
instant by sending them a r-text. (You send them the r-text which
conveys the r-snapshot which is the current state of that resource.)
In some cases, some resources do not vary over time, which means that
there is a single r-snapshot for them, and some may even have a single
r-text
(no content negotiation). In such a specific case (static resources on
the web),
the three concepts conflate into a single one.
The challenge is to deal with dynamic contents.
Illustration inspired by the example.
- government (gov) converts data (d1) to RDF file (f1) at time (t1)
using xlst transform
- government (gov) uploads RDF data (f1) into a triple store, exposed
as Web resource (r1)
- analyst (alice) downloads a turtle serialization (lcp1) of the
resource (r1) from government portal
Illustrations:
- r1: is a resource: it's the triple store, its a container, its content
can vary over time
- lcp1: is a r-text (turtle serialization) of a given snapshot (created
by, or available at the time of, download)
- f1 is a local file: it can be seen as a stateless anonymous resource,
with a single r-text.
If in addition:
- analyst (alice) downloads a rdf/xml serialization (lcp2) of the
resource (r1)
If the content of r1 has not changed, then lcp2 and lcp1 are both
r-texts of a same r-snapshot.
Note that this is not limited to RDF (as Graham mentioned)
- newspaper (news), uses a CMS to publish the incidence map (map1),
chart (c1) and
the image (img1) within a document (art1) written by (joe) using
license (li2)
- newspaper (news), updates art1, adding a correction following a
complaint from a reader
Illustrations:
- art1 is a also resource, with two r-snapshots (before and after
correction)
- with language negotiation, an http client can download html and xhtml
representations (i.e., r-texts) of the article
What do you think?
Cheers,
Luc
On 05/25/2011 06:49 AM, Paul Groth wrote:
> Hi,
>
> To throw out some, perhaps simpler, definitions into the mix that I
> think follow along the lines of what's being discussed.
>
> Resource - something that can be identified
>
> Snapshot - the state of a resource at particular point in time
>
> In the Data Journalism Scenario: a 'resource' would be the web page. a
> 'snapshot' would be the web page before publication.
>
> cheers,
> Paul
>
> Note: Similar concepts are found within many provenance models that I
> know of....if it's helpful I can list those out
>
--
Professor Luc Moreau
Electronics and Computer Science tel: +44 23 8059 4487
University of Southampton fax: +44 23 8059 2865
Southampton SO17 1BJ email: l.moreau@ecs.soton.ac.uk
United Kingdom http://www.ecs.soton.ac.uk/~lavm
Received on Wednesday, 25 May 2011 07:43:20 UTC