Re: prov-wg: Telecon Agenda March 8, 2012

From: Luc Moreau <L.Moreau@ecs.soton.ac.uk>
Date: Thu, 08 Mar 2012 15:44:47 +0000
To: public-prov-wg@w3.org
Hi Curt,

Responses below,

On 03/08/2012 02:35 PM, Curt Tilmes wrote:
> On 03/08/2012 06:26 AM, Graham Klyne wrote:
>> Luc, thanks ... comments below:
>> Re:
>> http://dvcs.w3.org/hg/prov/raw-file/default/model/working-copy/wd5-prov-dm-derivation.html 
> [...]
>> In section 1, I'm still seeing a description that doesn't match the
>> sample ASN (e.g. which of the listed values does "e2" correspond to,
>> etc.)
> Yes especially when the description in the examples is simply:
>   "The first and second lines are about derivations between e2 and
>   e1."
> We need to make it very clear which is the "generatedEntity" and which
> is the "usedEntity".  Using those terms instead of simply e1,e2 make
> that more clear, or use a description for the example worded more like
> the third "using the entity e1 ... derived the entity e2" to make the
> direction of the derivation explicit.

Good point. I have updated the text in the example accordingly.

> I think Graham is suggesting that the terms in the ASN template match
> the terms in the term description list.  It currently requires the
> reader to match up "e2" with "generatedEntity" and "e1" with
> "usedEntity" by their order in the list.  It also seems awkward to use
> the '2' in 'g2' and 1 in 'u1' to associate the usage with e1 and the
> generation with e2.  Using common terms rather than relying matching
> order might make it easier (i.e. more clear) on the user.

I have now made it clear which parameter is which in the definition.

>> The first two look reasonable to me, but I still don't see why
>> wasDerivedFrom(e2,e1,a,g2,u1) is needed.  Once we have expressions
>> that explicitly name activities, how much real value is there in
>> having the "short cut" form?  Can't this be expressed by having an
>> explicit activity record, etc.?
>> (I'm not suggesting the model should not be capable of expressing
>> this information, just arguing against this overloading of the
>> wasDerivedFrom record which AIUI is primarily an entity-entity
>> relation.)
> It does seem like bundling everything into one
> wasDerivedFrom(ie,e2,e1,a,g2,u1,attrs) is more complicated than simply
> requiring three distinct statements
>  wasDerivedFrom(id1,e2,e1,dattrs)
>  wasGeneratedBy(id2,e2,a,t2,gattrs)
>  used(id3,a,e1,t1,uattrs)

The problem is that you could have another usage


at a different time t1' not causing the derivation.

Also, a2 could also use e2 and generate e1 at the same time as a.


So, it's essential to list the activity/usage/generation *in* the 
derivation expression.
> You state "The reason for optional information such as activity,
> generation, and usage to be linked to derivations is to aid analysis
> of provenance and to facilitate provenance-based reproducibility."
> Is that linking really required?  Within a single account couldn't you
> always infer that association?  I understand that different accounts
> might describe the activities at a different level of granularity and
> indicate, for example, that a single entity was generated by two
> different activities, but within a single account shouldn't it always
> be clear?  Even if it is possible for a DM user to express it
> unclearly, couldn't we just ask them not to with an explanatory
> section rather than simply giving them more rope?

I don't think we can infer this, even within an account. Multiple usages 
can be involved.
> Especially in the name of simplification, I wouldn't introduce this
> added complexity unless we can describe a clear case where it is
> needed.

Reproducibility of results is the key driver.  If we can't identify 
which activity, which usage, and which generation underpin a
derivation, we can't reproduce results.

> Curt

