RE: Internal Review Prov Dictionary from Miles, Simon on 2013-04-10 (public-prov-wg@w3.org from April 2013)

From: Miles, Simon <simon.miles@kcl.ac.uk>
Date: Wed, 10 Apr 2013 09:27:04 +0000
To: Tom De Nies <tom.denies@ugent.be>
CC: Sam Coppens Ugent <sam.coppens@ugent.be>, Provenance Working Group <public-prov-wg@w3.org>
Message-ID: <AA3FA22D967B5C4E8948AADF719DA7C41090E5ED@AM2PRD0311MB409.eurprd03.prod.outlook.>
Hi Tom,

Thanks. Yes, that all looks good and it is clearer now you have made the point about insertion being difference.

thanks,
Simon

Dr Simon Miles
Senior Lecturer, Department of Informatics
Kings College London, WC2R 2LS, UK
+44 (0)20 7848 1166

Modelling the Provenance of Data in Autonomous Systems:
http://eprints.dcs.kcl.ac.uk/1264/
________________________________
From: Tom De Nies [tom.denies@ugent.be]
Sent: 10 April 2013 10:20
To: Miles, Simon
Cc: Sam Coppens Ugent; Provenance Working Group
Subject: Re: Internal Review Prov Dictionary

Hi Simon,

thanks a lot for your review. I've included your suggested changes in the document and responded below.

One matter that confused me was whether insertion and removal are operations, i.e. activities that happen to one dictionary to create another, or differences, i.e. a comparison between two dictionaries. In the end, given the constraints you define, I decided they must be differences, e.g. d2 wasDerivedByInsertionFrom d1 means that the difference between d2 and d1 is that d2 is a superset of d1 with some explicit new entries. However, the text (especially Section 3) talks as if they are operations, referring to "following insertion" or "after a removal", e.g. "An Insertion relation... states that d2 is the dictionary following insertion of pairs... into dictionary d1."

This distinction has practical consequences. If I wanted to describe, in the provenance data, the complete membership of my office on 2013-04-02, I could say:

  entity (simon-office-oncreation, [prov:type='prov:EmptyDictionary'])
  entity (simon-office-20130402, [prov:type='prov:Dictionary'])
  derivedFromInsertionFrom (simon-office-20130402, simon-office-oncreation,
      {("on-black-chair", "simon"})

>From this, I would know that Simon was the only member of simon-office-20130402.  However, the last line is only true information if insertion is merely the difference between the two dictionaries, and other insertions and removals could have occurred in between time. If insertion is an operation, then it suggests no-one has entered or left my office since its creation, which is untrue. In summary, I think it would help to clarify in the text whether insertion and removal should be read as diffs or operations. If they can be interpreted as diffs, then I think that makes the model more flexible.


We agree with your comments, and this is indeed how a dictionary should be used in the context of provenance. This way, you could specify for example the members of a baseball team from season to season,  without having to use a different insertion/removal every time someone leaves during the season.
I've modified to explanatory text for insertion/removal to clarify this:
An Insertion relation prov:derivedByInsertionFrom(id; d2, d1, {(key_1, e_1), ..., (key_n, e_n)}) states that d2 is the dictionary following the insertion of key-entity pairs (key_1, e_1), ..., (key_n, e_n) into dictionary d1. In other words, the set of key-entity pairs (key_1, e_1), ..., (key_n, e_n) is to be seen as the difference between d1 and d2. Note that this key-entity-set is considered to be complete. This means that we assume that no unknown keys were inserted in or removed from a dictionary derived by an insertion relation. This is formalized in Inference D8.

Dito for removal, and I've made sure to revise statements like "after removal". Does this address your concerns sufficiently?

Other comments:

Section 3: Just above Example 1: "to explicitly state that a dictionary is empty, it is recommended that the prov:type prov:EmptyCollection is used". Shouldn't that be prov:EmptyDictionary?

Yes indeed, well spotted!

It seems a shame that "keys cannot repeated in the same dictionary", as it is somewhat of a restriction, but I understand it makes the update and removal semantics a lot cleaner and seems justified for that reason.

Yes, that is the reason. Otherwise we would have to leave the constructs very unconstrained, and we gathered from the comments of the group that the constraints are exactly what makes it interesting to use a dictionary instead of a collection.

Example 2: "// d1 is a dictionary" -> "d1" should be "d"

Section 4.1: "PROV-Dictionary provides no dedicated syntax for Collection and EmptyCollection." - I think you mean Dictionary and EmptyDictionary?

Example 9: "d1 is the identifier" -> should be "d3"

Section 5: "(Note that this file is unfinished at the time of this working draft" -> this will no longer be a working draft on next release.

Section 5.2: "Class: prov:Dictionary back to overview" (and similarly for other definitions) - I assume that the "back to overview" should not be part of the title but a separate link? Otherwise, I don't understand the title.

Section 5.2, prov:Dictionary definition: "are said to be member" -> should be "members"

All updated. Thanks for spotting these typos!

Inference D6: "and K1 is a set of keys" - K1 does not appear in the inference rule.

Indeed, this was supposed to go with D7.

Constraint D11 is called D10 (so there are two D10s).
Done.

Could you give the updated document a quick glance and tell us whether your comments are resolved?
https://dvcs.w3.org/hg/prov/raw-file/default/dictionary/Overview.html
Thanks!

- Tom
Received on Wednesday, 10 April 2013 09:28:03 UTC