PROV-ISSUE-229 (Refactor-and-sub-edit): Document would benefit from refactoring and editing [prov-dm]

PROV-ISSUE-229 (Refactor-and-sub-edit): Document would benefit from refactoring and editing [prov-dm]

http://www.w3.org/2011/prov/track/issues/229

Raised by: Graham Klyne
On product: prov-dm

I am finding some of the text to be repetitive, confusing and in some cases strangely phrased.  I think a main goal of this document needs to be to offer an approachable description of the underlying data model and ASN notation that can be used by developers and information designers.  I think the document could benefit from a serious round of sub-editing (without intending to change the substantive content).

I also think that a refactoring of the DM concepts (without fundamentally changing the underlying intended semantics) could help to eliminate a lot of repetitive text.  These comments relate to the recent "domain of discourse" vote, but I'm coming at this from a more holistic perspective.

It seems to me that the domain of discourse contains the following concepts:
  Entity
  Activity
  Agent
  Event
  Plan
  Account
in that these are the various things about which the provenance language aims to make assertions, and that all of these could be considered types of Entity (with the possible exception of Event).  I think we've already established that most if not all of these are kinds of entity.

If the descriptions were refactored around such a structure, I believe much of the repetitive description of attributes could be focused in one place.  I would be inclined to separate attributes from the other type declarations, so we'd end up with primitive ASM expressions like these:

  Entity(id)
  Activity(id, start?, end?)
  Agent(id)
  Plan(id)
  Event(Id, time?)
  Account(id)
  Attributes(id, [attr1=val1, attr2=val2, ...])

Where the Attributes expression could be applied to any of the preceding concepts, and the description of attributes would consequently be needed only once.  The main objection I see to this is that it would mean that, say, the ASN expression:

  Entity(id, [attr1=val1, attr2=val2, ...])

would be replaced by two expressions:

  Entity(id)
  Attributes(id, [attr1=val1, attr2=val2, ...])

I would counter this by having the ASN (but not the underlying model) allow the first form as a syntactic sugar for the second.

...

I also felt that the handling of Activity start and end was not consistent:  according to the text, the times given correspond to Events.  So why not have them *be* Events - that would mean we have a total of 6 event types rather than just 4, but the description of the "Lamport clock" timelines could be focused on the description of Event alone.

...

I think all of this could be done with minimal change to the underlying semantics, and that coupled with a significant round of sub-editing and reorganization of some of the text could lead to a document that is much easier to follow.

#g
--

Received on Monday, 30 January 2012 10:48:34 UTC