Re: prov-wg: Telecon Agenda March 8, 2012 from Curt Tilmes on 2012-03-08 (public-prov-wg@w3.org from March 2012)

From: Curt Tilmes <Curt.Tilmes@nasa.gov>
Date: Thu, 8 Mar 2012 09:35:02 -0500
To: <public-prov-wg@w3.org>
Message-ID: <4F58C396.5040108@nasa.gov>
On 03/08/2012 06:26 AM, Graham Klyne wrote:
> Luc, thanks ... comments below:
>
> Re:
> http://dvcs.w3.org/hg/prov/raw-file/default/model/working-copy/wd5-prov-dm-derivation.html

[...]

> In section 1, I'm still seeing a description that doesn't match the
> sample ASN (e.g. which of the listed values does "e2" correspond to,
> etc.)

Yes especially when the description in the examples is simply:

   "The first and second lines are about derivations between e2 and
   e1."

We need to make it very clear which is the "generatedEntity" and which
is the "usedEntity".  Using those terms instead of simply e1,e2 make
that more clear, or use a description for the example worded more like
the third "using the entity e1 ... derived the entity e2" to make the
direction of the derivation explicit.

I think Graham is suggesting that the terms in the ASN template match
the terms in the term description list.  It currently requires the
reader to match up "e2" with "generatedEntity" and "e1" with
"usedEntity" by their order in the list.  It also seems awkward to use
the '2' in 'g2' and 1 in 'u1' to associate the usage with e1 and the
generation with e2.  Using common terms rather than relying matching
order might make it easier (i.e. more clear) on the user.


> The first two look reasonable to me, but I still don't see why
> wasDerivedFrom(e2,e1,a,g2,u1) is needed.  Once we have expressions
> that explicitly name activities, how much real value is there in
> having the "short cut" form?  Can't this be expressed by having an
> explicit activity record, etc.?
>
> (I'm not suggesting the model should not be capable of expressing
> this information, just arguing against this overloading of the
> wasDerivedFrom record which AIUI is primarily an entity-entity
> relation.)

It does seem like bundling everything into one
wasDerivedFrom(ie,e2,e1,a,g2,u1,attrs) is more complicated than simply
requiring three distinct statements

  wasDerivedFrom(id1,e2,e1,dattrs)
  wasGeneratedBy(id2,e2,a,t2,gattrs)
  used(id3,a,e1,t1,uattrs)

You state "The reason for optional information such as activity,
generation, and usage to be linked to derivations is to aid analysis
of provenance and to facilitate provenance-based reproducibility."

Is that linking really required?  Within a single account couldn't you
always infer that association?  I understand that different accounts
might describe the activities at a different level of granularity and
indicate, for example, that a single entity was generated by two
different activities, but within a single account shouldn't it always
be clear?  Even if it is possible for a DM user to express it
unclearly, couldn't we just ask them not to with an explanatory
section rather than simply giving them more rope?

Especially in the name of simplification, I wouldn't introduce this
added complexity unless we can describe a clear case where it is
needed.

Curt
Received on Thursday, 8 March 2012 14:35:45 UTC