Re: ISSUE-385: hasProvenanceIn: finding a solution from Luc Moreau on 2012-06-04 (public-prov-wg@w3.org from June 2012)

From: Luc Moreau <l.moreau@ecs.soton.ac.uk>
Date: Mon, 04 Jun 2012 10:09:46 +0100
To: Graham Klyne <graham.klyne@zoo.ox.ac.uk>
CC: Paul Groth <p.t.groth@vu.nl>, Jun Zhao <jun.zhao@zoo.ox.ac.uk>, "public-prov-wg@w3.org" <public-prov-wg@w3.org>
Message-ID: <EMEW3|3252f4c3e4b2cafd20d739651b66249ao53AB608l.moreau|ecs.soton.ac.uk|4FCC7B5A>
Hi Graham,

I strongly disagree with your statement of scope creep.

Since the very first draft of the prov-dm document, we have had an 
example where a vizualisation tool annotates/specializes an entity with 
visualization details (color, location, etc). PROV does not adequately 
support this use case. This use case is real ... we repeatedly display 
provenance graphs in our documents, don't we?

For the last two drafts or so, we have had examples of trust/rating, 
which are inadequately supported PROV.
It's indeed unfortunate, when recent text you suggested for the current 
draft [1] has no  less than six occurrences of the word
'trustworthy/trustworthyness', and about the same number of 'reliability'.

So, I argue that PROV inadequately supports two of its key use cases.

Luc


[1] 
http://dvcs.w3.org/hg/prov/raw-file/default/model/comments/wd6-Graham.txt

On 03/06/2012 19:29, Graham Klyne wrote:
> Paul,
>
> I somewhat agree with you.
>
> But, to play devil's advocate here, I could also argue that it's not 
> the job of a data model to help implementers *organize* their data.
>
> I will claim that, for the purposes of provenance data modelling, the 
> hasProvenanceIn is unnecessary.  If one has a number of bundles, one 
> could load them all into a single bundle (creating a new bundle that 
> is the union of the given bundles), then look for information about 
> particular entities in the merged bundle.  On that score, the data 
> model's job is done.
>
> What we *do* need, and the reason that we have bundles, is a way to 
> label a particular bundle of provenance to that we can assert 
> provenance *about* that bundle.  We don't need hasProvenanceIn for that.
>
> So, to repeat my earlier claim, we don't *need* hasProvenanceIn to 
> support the functionality that was intended (or agreed) to be provided 
> by provenance bundles.  In this respect, hasProvenanceIn is scope 
> creep.  So, if it's proving hard to agree what it means, I think it 
> should be dropped.
>
> #g
> -- 
>
>
> On 03/06/2012 17:40, Paul Groth wrote:
>> Hi Graham,
>>
>> I would argue that being able to refer to a bundle in which the
>> provenance of an entity is contained is an important piece of
>> functionality to allow people to easily organize their provenance
>> information.
>>
>> I can see the point about trying to reuse the relation between the PAQ
>> and the dm.
>>
>> cheers
>> Paul
>>
>>
>> On Sun, Jun 3, 2012 at 9:48 AM, Graham Klyne<GK@ninebynine.org>  wrote:
>>> (I'm replying arbitrarily to Jun's email to maintain the thread, but 
>>> my comment
>>> is to the issue in general.  As it happens, my point about semantics is
>>> underscored by Jun's comment about time constraint - I think it's a 
>>> non-issue
>>> here, but not obviously so.)
>>>
>>> I think the problem we're running into is that we agreed at the last 
>>> F2F to
>>> remove all the additional semantics associated with account.  Thus, to
>>> paraphrase Simon's excellent summary, a bundle is just a named set 
>>> of provenance
>>> statements without any further semantics.  But it appears that Luc's 
>>> example
>>> needs more semantics than just a named set of provenance statements 
>>> - and that's
>>> where I think we are running into problems, because we are not clear 
>>> about
>>> exactly what those additional semantics should be.
>>>
>>> Therefore I suggest that, according to prior WG agreement, Luc's 
>>> example is out
>>> of scope for us to fully resolve.  Paul's suggestion to provide the 
>>> attributes
>>> as an extensibility hook is one possible approach.
>>>
>>> Another possible and more radical approach, prompted by Tim's 
>>> earlier suggestion
>>> to take a local name from DC, is to drop hasProvenanceIn entirely 
>>> from the prov
>>> specification, and (in the usage guidelines) document use the DC 
>>> term for this
>>> purpose.  This will leave the field clear for subsequent work to 
>>> define a
>>> suitable cross-bundle primitive when we have a clearer common 
>>> understanding of
>>> the actual requirements.
>>>
>>> I summary, options that work for me would be (in order of preference):
>>> (1) drop hasProvenanceIn entirely and move on.  Use existing terms 
>>> from other
>>> vocabularies to express this idea. (**)
>>> (2) adopt Paul's suggestion of an extensible 2-place relation (*)
>>>
>>> (*) noting the importance of monotonicity here: extension attributes 
>>> must not be
>>> able to change semantics of the underlying property.  If the 
>>> underlyong property
>>> has no (formal) semantics, this is easy.  If the underlying property 
>>> does have
>>> built-in semantics, then the utility of the extension may be limited 
>>> (or worse,
>>> careless extensions may break the underlying semantic model 
>>> associated with the
>>> core provenance model).
>>>
>>> (**) the slight inconsistency here would be that PROV-AQ still 
>>> requires a
>>> prov:hasProvenance relation.  I'm OK with this because PROV-AQ is 
>>> intended to
>>> address operational concerns where the model is not.  But this does 
>>> create a
>>> reasonably compelling argument for having a corresponding relation 
>>> in the model
>>> - if the semantics are minimal then the same relation can work at 
>>> both levels.
>>>
>>> #g
>>> -- 
>>>
>>>
>>> On 02/06/2012 22:36, Jun Zhao wrote:
>>>> Paul,
>>>>
>>>> At first sight, I loved your proposal. But after reading into it, I 
>>>> got less sure.
>>>>
>>>> This property is to allow locating the bundle in which the 
>>>> provenance of an entity is described. To qualify this, would it 
>>>> mean that, e.g, there is a time period during which you can find 
>>>> provenance of that entity in the bundle and after that you can't?
>>>>
>>>> Although the pattern you propose makes sense, I can't see when 
>>>> people need to qualify this relation. If you have a more concrete 
>>>> example in mind, I am ready to be convinced!
>>>>
>>>> Cheers,
>>>>
>>>> Jun
>>>>
>>>> Sent from my iPad (sorry for the brevity)
>>>>
>>>> On 1 Jun 2012, at 17:03, Paul Groth<p.t.groth@vu.nl>    wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> It seems that a one approach would be to define an extensible version
>>>>> of hasProvenanceIn and leave it at that.
>>>>>
>>>>> hasProvenanceIn(id, entity, bundle, attrs).
>>>>>
>>>>> Like all our extensible relations, we would also have the straight
>>>>> binary version
>>>>>
>>>>> hasProvenanceIn(entity,bundle)
>>>>>
>>>>> This would allow for the extensibility to cater for Luc's use case 
>>>>> but
>>>>> also for other use cases where extension is nice. For example, I can
>>>>> imagine a system wanting to put a time constraint on the 
>>>>> applicability
>>>>> of provenance in a bundle to an entity.
>>>>>
>>>>> This would leave it up to people to define specialization, alternate
>>>>> and derivation relations between entities as they want.
>>>>>
>>>>> Would this be acceptable to the group?
>>>>>
>>>>> Thanks
>>>>> Paul
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jun 1, 2012 at 5:33 PM, Luc 
>>>>> Moreau<L.Moreau@ecs.soton.ac.uk>    wrote:
>>>>>> Hi Simon,
>>>>>>
>>>>>> Thanks for your message. I feel you don't directly respond to the 
>>>>>> points
>>>>>> that I raised,
>>>>>> and therefore all my comments stand.
>>>>>>
>>>>>> I respond to your points below.
>>>>>>
>>>>>> On 06/01/2012 03:39 PM, Miles, Simon wrote:
>>>>>>> Hi Luc,
>>>>>>>
>>>>>>> I will try to articulate the points which I think back up the 
>>>>>>> binary relations proposal.
>>>>>>>
>>>>>>> 1. As I understood, there is currently no semantics to a bundle. 
>>>>>>> A querier can choose to consider the descriptions in the bundle 
>>>>>>> or not (based on the bundle's provenance), but whether there are 
>>>>>>> one or many bundles, the querier just has a set of PROV 
>>>>>>> descriptions. The bundles need to be found and known to be 
>>>>>>> relevant, which is why hasProvenanceIn (or isTopicOf) is needed. 
>>>>>>> After that, which bundle a description is in is irrelevant and 
>>>>>>> the bundling can be ignored. A specific extension of PROV may 
>>>>>>> change this by adding semantics to bundles, but this is not in 
>>>>>>> the current specification.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> A close notion to bundle in prior provenance art is opm:Account, and
>>>>>> there is plenty of evidence
>>>>>> that merging accounts may lead to contradictions.  PROV, rightly so,
>>>>>> does not define a union operator
>>>>>> over bundles, and is silent about merging or not bundles.
>>>>>>
>>>>>> Therefore,  there is nothing in PROV that backs this statement 
>>>>>> "which
>>>>>> bundle a description is in is
>>>>>> irrelevant and the bundling can be ignored".
>>>>>>
>>>>>> You are suggesting that an extension of PROV may add semantics to
>>>>>> bundles: that's exactly what you
>>>>>> have done, by implying they are mergeable.
>>>>>>
>>>>>>> Taking the statements from the three bundles below, a querier 
>>>>>>> would end up with:
>>>>>>>
>>>>>>>    activity(ex:a1, 2011-11-16T16:00:00,2011-11-16T17:0:00)
>>>>>>>    wasAssociatedWith(ex:a1,ex:Bob,[prov:role="controller"])
>>>>>>>    activity(ex:a2, 2011-11-17T10:00:00,2011-11-17T17:0:00)
>>>>>>>    wasAssociatedWith(ex:a2,ex:Bob,[prov:role="controller"])
>>>>>>>    agent(tool:Bob1, [perf:rating="good"])
>>>>>>>    agent(tool:Bob2, [perf:rating="bad"])
>>>>>>>
>>>>>>> I can see nothing in the current specification to suggest this 
>>>>>>> means anything different to when these descriptions are 
>>>>>>> separated into multiple bundles. Do you agree?
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> PROV does not specify whether they mean something different or not.
>>>>>>
>>>>>>> 2. If there are two entity identifiers relating to the same 
>>>>>>> thing/entity, we need to say how they are connected: either 
>>>>>>> alternateOf, specializationOf, or possibly some external 
>>>>>>> relation such as owl:sameAs. While the example below happens to 
>>>>>>> imply a specialisation relation between tool:Bob1 and ex:Bob, 
>>>>>>> there is no reason to believe this is true in all cases: 
>>>>>>> alternateOf is just as possible. So, hasProvenanceIn cannot 
>>>>>>> imply or be a sub-type of either specializationOf or 
>>>>>>> alternateOf, the appropriate one must be asserted separately.
>>>>>>>
>>>>>>
>>>>>> I agree that being able to assert subtypes for hasProvenanceIn is
>>>>>> important: that why I am
>>>>>> in favour of having hasProvenanceIn a n-ary relation that includes
>>>>>> attributes so that prov:type can be
>>>>>> used for what you suggest.
>>>>>>> 3. The same thing described from different perspectives has 
>>>>>>> multiple identifiers regardless of bundles, i.e. at least one 
>>>>>>> for each entity. When a bundle is newly read by a querier 
>>>>>>> interested in the provenance of entity E, they should consider 
>>>>>>> every entity E is a specialisation of, and look for those 
>>>>>>> identifiers as well. If they don't, they will miss information 
>>>>>>> about the provenance of E described at a coarser granularity.
>>>>>>>
>>>>>>> For example, ex:Bob may be a specialisation of ex:GeneralBob, 
>>>>>>> and bundle ex:run1 might describe something about 
>>>>>>> ex:GeneralBob's provenance. This makes 
>>>>>>> "hasProvenanceIn(tool:Bob1, ex:run1, ex:Bob)" strange, because 
>>>>>>> it is not only ex:Bob that is relevant to look for in ex:run1.
>>>>>>>
>>>>>>> Separating concerns, I'd argue it is preferable to say:
>>>>>>>    hasProvenanceIn(tool:Bob1, ex:run1)
>>>>>>>    specializationOf(tool:Bob1, ex:Bob)
>>>>>>>    specializationOf(tool:Bob, ex:GeneralBob)
>>>>>>>
>>>>>> But this latter statement would belong to the ex:run1 bundle I 
>>>>>> assume.
>>>>>> It is not going to be known to be relevant to me until I have 
>>>>>> correctly
>>>>>> been able to link tool:Bob1 to ex:Bob in run1.
>>>>>>
>>>>>>
>>>>>>> and let the que
>>>>
>>>>
>>
Received on Monday, 4 June 2012 09:11:48 UTC