W3C home > Mailing lists > Public > public-prov-wg@w3.org > April 2012

RE: actions related to collections

From: Miles, Simon <simon.miles@kcl.ac.uk>
Date: Thu, 19 Apr 2012 15:56:35 +0100
To: Provenance Working Group <public-prov-wg@w3.org>
Message-ID: <FE37361E55FDC343A27E119DFB7785BB40B542CD2D@KCL-MAIL04.kclad.ds.kcl.ac.uk>
Hello Curt,

I don't have a particular problem with collections being described in a separate document from the DM.

However, I would still argue that they are special for provenance and it is not merely about the prevalance or importance of collections across domains where provenance is used. In particular:

1. Almost everything has parts, and the provenance of something is partly the provenance of its parts. If I ask for the provenance of a webpage, I don't just want to know who designed the page as a whole, but also where the images came from, what license they were published under etc. The latter provenance descriptions concern the images, but the question is about the page, so the relation between them is directly relevant to provenance.

2. Because of the first point above, knowing when parts are inserted or removed is also important to understanding something's provenance. What happened to an image after it was removed is not relevant to the provenance of the page that contained the image. What happened to the image before it was inserted did not affect the page until it was inserted, while every change to the image after insertion affects the page at the same time.

I'd therefore argue that collections, including membership, insertion and deletion, are vital to understanding provenance, not just coincidentally prevalent.


Dr Simon Miles
Senior Lecturer, Department of Informatics
Kings College London, WC2R 2LS, UK
+44 (0)20 7848 1166

Automatically Adapting Source Code to Document Provenance:
From: Curt Tilmes [Curt.Tilmes@nasa.gov]
Sent: 19 April 2012 15:35
To: public-prov-wg@w3.org
Subject: Re: actions related to collections

On 04/12/2012 05:06 PM, Luc Moreau wrote:
> Hi Jun and Satya,
> Following today's call, ACTION-76 [1] and ACTION-77 [2] were raised
> against you, as we agreed.
> [1] https://www.w3.org/2011/prov/track/actions/76
> [2] https://www.w3.org/2011/prov/track/actions/77

I've been going over the "collections" traffic.

I mentioned this briefly on the call last week, but I'll state it
once more for the record, then keep my peace.

The bulk of PROV-DM is describing what I'll call core or fundamental
concepts for describing provenance.

You have a general 'entity', it gets 'used' by an 'activity' and
'generates' a new 'entity'.  Those concepts are all necessary to the
data model, and it doesn't hold together without them.

Collections, IMHO, don't fall into that category.

They should be a layer on top of the DM, not a set of fundamental
concepts beside the others or integrated with them.

A collection is simply another type of entity, it changes in several
ways, the previous instance of it getting used by various activities,
resulting in the generation of a new entity.

We should model that just like any other entity that gets changed in
any number of ways.  Insertion/Removal are just like any other
activities.  They use one entity (the previous collection), make some
changes, and generate a new entity (the next version of the
collection).  They aren't 'special' enough to include in PROV-DM.

One could argue (several of you have) that collections are very
important, since they cross so many domains. I could buy that, but
there are also many different types of collections (touched on by the
discussion) and the types of representations and changes that happen
to the collections, and importance of various aspects of provenance of
those changes are different for each of them.

Take what we have here, make it a Collection Provenance Model or
something like that, and propose it separately as a middle layer on
top of PROV, below all the "Provenance of XXX"s that will be needed
for various domains, but leave it out of PROV-DM.

My 2 cents,

Received on Thursday, 19 April 2012 14:57:47 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:51:11 UTC