- From: Yolanda Gil <gil@isi.edu>
- Date: Thu, 8 Dec 2011 07:44:21 -0800
- To: Provenance Working Group WG <public-prov-wg@w3.org>
All, I went over the PROV-DM and the PROV-O documents, and have some comments. My first comment is that overall the documents read reasonably well, orders of magnitude better than the version that was released a couple of months ago. I have some suggestions, a few that could be easily and immediately done, others that should probably wait for the next round of revisions. Some easy edits that I think would improve the readability of the document. Others may seem to me like easy edits but perhaps you think deserve further discussion, I could not easily discern this for some of the items. My comments: 1) Section 2.1.1: The sentence "In the world, activities involve entities in multiple ways: they consume them, they process them, they transform them, they modify them, they change them, they relocate them, they use them, they generate them, they are controlled by them, etc." could be improved by stating it as: "In the world, activities involve entities in multiple ways: consuming them, processing them, transforming them, modifying them, changing them, relocating them, using them, generating them, being controlled by them, etc. 2) Section 2.1.1: I'd add a sentence at the end of the description of agent to say why it is considered a subclass of entity, something like "PROV-DM considers agents as a type of entity so that the model can be used to represent the provenance of the agents themselves. For example, a spellchecker software may be an agent of a document preparation activity, but itself can have a provenance record that states who its vendor is." 3) At the beginning of section 3, the notion of a "record" is introduced. I get an idea of what is meant by record, but I don't think it is well motivated. OWL does not have "records" but it can be used to state assertions about classes and objects, so why do we need this notion of record. Also, what is there raises several questions that may or may not have the following answers: "A provenance record is composed of a set of entity records, a set of activity records, a set of agent records, a set of generation records, (and so on). An entity record is a type of provenance record (and so are the others). A provenance record can have in turn its own provenance record, where it would be considered an entity." 4) Section 4.1: It took me a couple of backs and forths to realize that e0 is a type while e1...e6 are instances. I'd suggest to rename e0 to be crime-file, or cf, or something like that. 5) Section 4.2: The examples of Activity Records I think would be more clear if they had "edit" instead of "add-crime-in-London" and "edit" instead of "edit-London-New-York". 6) Section 4.2: In the examples of Generation Records, I did not understand g1 and g2 at all. 7) Section 5.1: The terms "account", "production", and "record container" pop up out of nowhere. They should be introduced and motivated a bit. They should also be related to the notion of "record" better than they are now, this is not very clear. I suspect there might be plans to discuss this aspect of the model further in the WG. 8) Section 5.2.1: I would change the sentence "If an asserter wishes to characterize an entity with the same attribute-value pairs over several intervals, then they are required to assert multiple entity records, each with its own identifier (so as to allow potential dependencies between the various entity records to be expressed)." to clarify the asserting so it says something like: "If an asserter wishes to characterize an entity with the same attribute-value pairs over several intervals, then they are required to directly assert or create axioms to infer assertions for multiple entity records, each with its own identifier (so as to allow potential dependencies between the various entity records to be expressed).". 9) Section 5.2.3: The examples of agents could include a spellchecker agent, just to show a bit of diversity in what we consider to be agents. 10) Section 5.2.4: The example of the note record should show a link to some provenance record, ideally one that would have been shown as an example in section 5.1 (maybe the g1 and g2 that I mentioned in point 5 above). 11) Section 5.3.3: We argue that we don't want to get into "responsibility". But we introduce the term "responsible" and "subordinate". I suggest we refer to them as "represented-agent" and "representing-agent" instead. Also, the section is titled "Responsibility Record", so that will be confusing, maybe "Delegation Record" would be better. 12) Section 5.3.3: The example uses the terms "delegation" and "contract". Perhaps useful to mention that these are domain terms to make clear that they are not part of the model. 13) Section 5.3.3.1: The sentence "To promote take-up, PROV-DM offers a mild version of responsibility in the form of a relation to represent when an agent acted on another agent's behalf." is a bit of an awkard way to introduce this, so I'd replace it by "The definition of agent mentions that an agent is a type of entity that can be assigned some degree of responsibility for an activity. In many situations, the creators of a provenance record may not have the authority to ascribe responsibility to the various agents that they know are involved in the activity. For example, the developer of a provenance service using PROV-DM could say that a student and his advisor were both involved in creating a dataset, but might not be in a position to know who has actual responsibility for the dataset. Responsibility often has legal connotations that could deter developers and users of PROV-DM from stating responsibility assertions in provenance records. To address this, PROV-DM offers a mild version of responsibility in the form of a relation to represent when an agent acted on another agent's behalf.". 14) Section 5.3.3.2: The terms "we introduce a PROV-DM reserved attribute STEPS" is used for the first time, no idea what that means. Maybe just say "we introduce a PROV-DM attribute STEPS". More importantly, I did not understand what steps means. 15) Section 5.3.3.3: Complementarity is very confusing, even its description in the primer was confusing to me. And I am a planning person used to thinking about entities changing, states, fluents, etc etc. I even wrote a survey on "Planning and Description Logics" a while back. But this is actually a very complex area that I don't think is well understood at all. For my money, I would say this is worth a side chat with Pat Hayes about this particular aspect of the model, to get his guidance. He originally worked with McCarthy on the frame problem and understands very well all the different issues involved in this type of logic to reason about actions and change. 16) Section 5.4.1: Could use a bit of introduction to introduce why separate provenance records may be created. For example, in emailing a file there could be a provenance record kept by the mail client, another by the SMPT server, etc. It would also be useful to give motivating scenarios/examples for how accounts can be nested. 17) Section 5.4.1: The example that introduces account acc2 says at the end that the result of the merge violates generation-unicity. But if I am following this correctly, if a1 and a0 are asserted to be the same then there is no violation. Perhaps worth clarifying this, or perhaps finding a more real-world example that really really creates a violation. Otherwise people are going to be scared of merging provenance records, which I think is the opposite of what we want. 18) Section 5.4.2: It states "A record container is not a record." I am puzzled. This is related to the confusion I raised in point 6. 19) Section 6.2: I did not understand why the notion of "traceability record", why is it introduced, and how is it different from a "derivation record". 20) Section 6.5: The sentence "Attribution models the notion of an activity generating an entity identified by e being controlled by an agent ag, which takes responsibility for generating e." could be perhaps replaced by "Attribution models the notion of an activity generating an entity identified by e being controlled by an agent ag." 21) Section 6.7: I did not understand what a "summary record" is. I am guessing we want a notion that someone can excerpt some subset of assertions from a provenance record in order to create a summary record. Is this right? If so, why would this apply only for entities and not for other parts of the model? Also: why wouldn't we use PROV- DM terms to express this meta-derivation? It would make the model easier if we did not need to add an extra notion of "summary record" as here. Or perhaps I did not understand. One more comment that I am pretty certain is more appropriate for future discussions of the model: 22) Proposal for hadRoleIn: This proposal is motivated by agent being a subclass of entity. Should there be a relation between entity and activity that is subsumes (generalizes) used, generatedby, and wasAssociatedWith? I think such a relation would allow us to state that an entity had to do with an activity but we don't yet know how exactly it was involved in the activity (eg whether it was an agent, or it was used by it, or generated by it, or...). I would propose to call this something like <entity hadARoleIn activity>. We should think about how this aligns with what we now call "roles" (my choice of name for this new general relation is not a coincidence), so in the examples in PROV-DM document section 5.2.3 instead of "[prov:role="sponsor"]" perhaps we could see sponsorOf as a specialization of hadRoleIn and of wasAssociatedWith. On a rather pedantic note, maybe "New-York" should be "New York", and that perhaps "half-hexagonal shape" should be "pentagon shape". Sorry for the long email... Best, Yolanda Yolanda Gil, USC/ISI +1-310-448-8794
Received on Thursday, 8 December 2011 15:45:09 UTC