Re: ISSUE-385: hasProvenanceIn: finding a solution from Graham Klyne on 2012-06-06 (public-prov-wg@w3.org from June 2012)

From: Graham Klyne <graham.klyne@zoo.ox.ac.uk>
Date: Wed, 06 Jun 2012 07:46:08 +0100
To: public-prov-wg@w3.org, Luc Moreau <l.moreau@ecs.soton.ac.uk>
Message-ID: <4FCEFCB0.4090100@zoo.ox.ac.uk>
Luc,

I use the term "scope creep" with reference to the revised goals we set at the 
second face-to-face meeting, in which the contextual aspect of provenance 
associated with accounts was dropped (actually, as I recall, we agreed to drop 
accounts completely, then reintroduced just enough of the accounts mechanism, 
renamed as bundles, to deal with the provenance-of-provenance requirement). 
Thus, I see addressing this visualization use-case as re-introducing 
functionality that we previously agreed to drop.

As a secondary point, I think the use of use-cases in developing a standard 
cannot be treated in the same way as use-cases for developing an application. 
An application has to completely cover the use-cases it is intended to address; 
a standard does not.  A standard should aim to document the consensus that 
exists, not to impose approaches for which consensus does not exist.  Later, 
when there is more experience of building applications to deal with a particular 
use-case, there may emerge a consensus that can be documented in a future 
revision of a standard.

So, for me, having a use-case that is not fully addressed by a specification is 
not sufficient grounds for claiming the specification is incomplete or lacking 
as a standard.

#g
--

On 04/06/2012 10:09, Luc Moreau wrote:
> Hi Graham,
>
> I strongly disagree with your statement of scope creep.
>
> Since the very first draft of the prov-dm document, we have had an example where
> a vizualisation tool annotates/specializes an entity with visualization details
> (color, location, etc). PROV does not adequately support this use case. This use
> case is real ... we repeatedly display provenance graphs in our documents, don't
> we?
>
> For the last two drafts or so, we have had examples of trust/rating, which are
> inadequately supported PROV.
> It's indeed unfortunate, when recent text you suggested for the current draft
> [1] has no less than six occurrences of the word
> 'trustworthy/trustworthyness', and about the same number of 'reliability'.
>
> So, I argue that PROV inadequately supports two of its key use cases.
>
> Luc
>
>
> [1] http://dvcs.w3.org/hg/prov/raw-file/default/model/comments/wd6-Graham.txt
>
> On 03/06/2012 19:29, Graham Klyne wrote:
>> Paul,
>>
>> I somewhat agree with you.
>>
>> But, to play devil's advocate here, I could also argue that it's not the job
>> of a data model to help implementers *organize* their data.
>>
>> I will claim that, for the purposes of provenance data modelling, the
>> hasProvenanceIn is unnecessary. If one has a number of bundles, one could load
>> them all into a single bundle (creating a new bundle that is the union of the
>> given bundles), then look for information about particular entities in the
>> merged bundle. On that score, the data model's job is done.
>>
>> What we *do* need, and the reason that we have bundles, is a way to label a
>> particular bundle of provenance to that we can assert provenance *about* that
>> bundle. We don't need hasProvenanceIn for that.
>>
>> So, to repeat my earlier claim, we don't *need* hasProvenanceIn to support the
>> functionality that was intended (or agreed) to be provided by provenance
>> bundles. In this respect, hasProvenanceIn is scope creep. So, if it's proving
>> hard to agree what it means, I think it should be dropped.
>>
>> #g
>> --
>>
>>
>> On 03/06/2012 17:40, Paul Groth wrote:
>>> Hi Graham,
>>>
>>> I would argue that being able to refer to a bundle in which the
>>> provenance of an entity is contained is an important piece of
>>> functionality to allow people to easily organize their provenance
>>> information.
>>>
>>> I can see the point about trying to reuse the relation between the PAQ
>>> and the dm.
>>>
>>> cheers
>>> Paul
>>>
>>>
>>> On Sun, Jun 3, 2012 at 9:48 AM, Graham Klyne<GK@ninebynine.org> wrote:
>>>> (I'm replying arbitrarily to Jun's email to maintain the thread, but my comment
>>>> is to the issue in general. As it happens, my point about semantics is
>>>> underscored by Jun's comment about time constraint - I think it's a non-issue
>>>> here, but not obviously so.)
>>>>
>>>> I think the problem we're running into is that we agreed at the last F2F to
>>>> remove all the additional semantics associated with account. Thus, to
>>>> paraphrase Simon's excellent summary, a bundle is just a named set of
>>>> provenance
>>>> statements without any further semantics. But it appears that Luc's example
>>>> needs more semantics than just a named set of provenance statements - and
>>>> that's
>>>> where I think we are running into problems, because we are not clear about
>>>> exactly what those additional semantics should be.
>>>>
>>>> Therefore I suggest that, according to prior WG agreement, Luc's example is out
>>>> of scope for us to fully resolve. Paul's suggestion to provide the attributes
>>>> as an extensibility hook is one possible approach.
>>>>
>>>> Another possible and more radical approach, prompted by Tim's earlier
>>>> suggestion
>>>> to take a local name from DC, is to drop hasProvenanceIn entirely from the prov
>>>> specification, and (in the usage guidelines) document use the DC term for this
>>>> purpose. This will leave the field clear for subsequent work to define a
>>>> suitable cross-bundle primitive when we have a clearer common understanding of
>>>> the actual requirements.
>>>>
>>>> I summary, options that work for me would be (in order of preference):
>>>> (1) drop hasProvenanceIn entirely and move on. Use existing terms from other
>>>> vocabularies to express this idea. (**)
>>>> (2) adopt Paul's suggestion of an extensible 2-place relation (*)
>>>>
>>>> (*) noting the importance of monotonicity here: extension attributes must
>>>> not be
>>>> able to change semantics of the underlying property. If the underlyong property
>>>> has no (formal) semantics, this is easy. If the underlying property does have
>>>> built-in semantics, then the utility of the extension may be limited (or worse,
>>>> careless extensions may break the underlying semantic model associated with the
>>>> core provenance model).
>>>>
>>>> (**) the slight inconsistency here would be that PROV-AQ still requires a
>>>> prov:hasProvenance relation. I'm OK with this because PROV-AQ is intended to
>>>> address operational concerns where the model is not. But this does create a
>>>> reasonably compelling argument for having a corresponding relation in the model
>>>> - if the semantics are minimal then the same relation can work at both levels.
>>>>
>>>> #g
>>>> --
>>>>
>>>>
>>>> On 02/06/2012 22:36, Jun Zhao wrote:
>>>>> Paul,
>>>>>
>>>>> At first sight, I loved your proposal. But after reading into it, I got
>>>>> less sure.
>>>>>
>>>>> This property is to allow locating the bundle in which the provenance of an
>>>>> entity is described. To qualify this, would it mean that, e.g, there is a
>>>>> time period during which you can find provenance of that entity in the
>>>>> bundle and after that you can't?
>>>>>
>>>>> Although the pattern you propose makes sense, I can't see when people need
>>>>> to qualify this relation. If you have a more concrete example in mind, I am
>>>>> ready to be convinced!
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Jun
>>>>>
>>>>> Sent from my iPad (sorry for the brevity)
>>>>>
>>>>> On 1 Jun 2012, at 17:03, Paul Groth<p.t.groth@vu.nl> wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> It seems that a one approach would be to define an extensible version
>>>>>> of hasProvenanceIn and leave it at that.
>>>>>>
>>>>>> hasProvenanceIn(id, entity, bundle, attrs).
>>>>>>
>>>>>> Like all our extensible relations, we would also have the straight
>>>>>> binary version
>>>>>>
>>>>>> hasProvenanceIn(entity,bundle)
>>>>>>
>>>>>> This would allow for the extensibility to cater for Luc's use case but
>>>>>> also for other use cases where extension is nice. For example, I can
>>>>>> imagine a system wanting to put a time constraint on the applicability
>>>>>> of provenance in a bundle to an entity.
>>>>>>
>>>>>> This would leave it up to people to define specialization, alternate
>>>>>> and derivation relations between entities as they want.
>>>>>>
>>>>>> Would this be acceptable to the group?
>>>>>>
>>>>>> Thanks
>>>>>> Paul
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jun 1, 2012 at 5:33 PM, Luc Moreau<L.Moreau@ecs.soton.ac.uk> wrote:
>>>>>>> Hi Simon,
>>>>>>>
>>>>>>> Thanks for your message. I feel you don't directly respond to the points
>>>>>>> that I raised,
>>>>>>> and therefore all my comments stand.
>>>>>>>
>>>>>>> I respond to your points below.
>>>>>>>
>>>>>>> On 06/01/2012 03:39 PM, Miles, Simon wrote:
>>>>>>>> Hi Luc,
>>>>>>>>
>>>>>>>> I will try to articulate the points which I think back up the binary
>>>>>>>> relations proposal.
>>>>>>>>
>>>>>>>> 1. As I understood, there is currently no semantics to a bundle. A
>>>>>>>> querier can choose to consider the descriptions in the bundle or not
>>>>>>>> (based on the bundle's provenance), but whether there are one or many
>>>>>>>> bundles, the querier just has a set of PROV descriptions. The bundles
>>>>>>>> need to be found and known to be relevant, which is why hasProvenanceIn
>>>>>>>> (or isTopicOf) is needed. After that, which bundle a description is in
>>>>>>>> is irrelevant and the bundling can be ignored. A specific extension of
>>>>>>>> PROV may change this by adding semantics to bundles, but this is not in
>>>>>>>> the current specification.
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> A close notion to bundle in prior provenance art is opm:Account, and
>>>>>>> there is plenty of evidence
>>>>>>> that merging accounts may lead to contradictions. PROV, rightly so,
>>>>>>> does not define a union operator
>>>>>>> over bundles, and is silent about merging or not bundles.
>>>>>>>
>>>>>>> Therefore, there is nothing in PROV that backs this statement "which
>>>>>>> bundle a description is in is
>>>>>>> irrelevant and the bundling can be ignored".
>>>>>>>
>>>>>>> You are suggesting that an extension of PROV may add semantics to
>>>>>>> bundles: that's exactly what you
>>>>>>> have done, by implying they are mergeable.
>>>>>>>
>>>>>>>> Taking the statements from the three bundles below, a querier would end
>>>>>>>> up with:
>>>>>>>>
>>>>>>>> activity(ex:a1, 2011-11-16T16:00:00,2011-11-16T17:0:00)
>>>>>>>> wasAssociatedWith(ex:a1,ex:Bob,[prov:role="controller"])
>>>>>>>> activity(ex:a2, 2011-11-17T10:00:00,2011-11-17T17:0:00)
>>>>>>>> wasAssociatedWith(ex:a2,ex:Bob,[prov:role="controller"])
>>>>>>>> agent(tool:Bob1, [perf:rating="good"])
>>>>>>>> agent(tool:Bob2, [perf:rating="bad"])
>>>>>>>>
>>>>>>>> I can see nothing in the current specification to suggest this means
>>>>>>>> anything different to when these descriptions are separated into
>>>>>>>> multiple bundles. Do you agree?
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> PROV does not specify whether they mean something different or not.
>>>>>>>
>>>>>>>> 2. If there are two entity identifiers relating to the same
>>>>>>>> thing/entity, we need to say how they are connected: either alternateOf,
>>>>>>>> specializationOf, or possibly some external relation such as owl:sameAs.
>>>>>>>> While the example below happens to imply a specialisation relation
>>>>>>>> between tool:Bob1 and ex:Bob, there is no reason to believe this is true
>>>>>>>> in all cases: alternateOf is just as possible. So, hasProvenanceIn
>>>>>>>> cannot imply or be a sub-type of either specializationOf or alternateOf,
>>>>>>>> the appropriate one must be asserted separately.
>>>>>>>>
>>>>>>>
>>>>>>> I agree that being able to assert subtypes for hasProvenanceIn is
>>>>>>> important: that why I am
>>>>>>> in favour of having hasProvenanceIn a n-ary relation that includes
>>>>>>> attributes so that prov:type can be
>>>>>>> used for what you suggest.
>>>>>>>> 3. The same thing described from different perspectives has multiple
>>>>>>>> identifiers regardless of bundles, i.e. at least one for each entity.
>>>>>>>> When a bundle is newly read by a querier interested in the provenance of
>>>>>>>> entity E, they should consider every entity E is a specialisation of,
>>>>>>>> and look for those identifiers as well. If they don't, they will miss
>>>>>>>> information about the provenance of E described at a coarser granularity.
>>>>>>>>
>>>>>>>> For example, ex:Bob may be a specialisation of ex:GeneralBob, and bundle
>>>>>>>> ex:run1 might describe something about ex:GeneralBob's provenance. This
>>>>>>>> makes "hasProvenanceIn(tool:Bob1, ex:run1, ex:Bob)" strange, because it
>>>>>>>> is not only ex:Bob that is relevant to look for in ex:run1.
>>>>>>>>
>>>>>>>> Separating concerns, I'd argue it is preferable to say:
>>>>>>>> hasProvenanceIn(tool:Bob1, ex:run1)
>>>>>>>> specializationOf(tool:Bob1, ex:Bob)
>>>>>>>> specializationOf(tool:Bob, ex:GeneralBob)
>>>>>>>>
>>>>>>> But this latter statement would belong to the ex:run1 bundle I assume.
>>>>>>> It is not going to be known to be relevant to me until I have correctly
>>>>>>> been able to link tool:Bob1 to ex:Bob in run1.
>>>>>>>
>>>>>>>
>>>>>>>> and let the que
>>>>>
>>>>>
>>>
>
Received on Wednesday, 6 June 2012 06:47:09 UTC