Re: PROV-ISSUE-385 (haProvenanceIn-complexity): The hasProvenbanceIn relation is over-complicated [prov-dm] from Luc Moreau on 2012-05-29 (public-prov-wg@w3.org from May 2012)

From: Luc Moreau <L.Moreau@ecs.soton.ac.uk>
Date: Tue, 29 May 2012 22:57:31 +0100
To: Graham Klyne <graham.klyne@zoo.ox.ac.uk>
CC: public-prov-wg@w3.org
Message-ID: <EMEW3|61470d37bf323d95134aed38f5fd03cao4SMve08L.Moreau|ecs.soton.ac.uk|4FC5464B>
Hi Graham,

Questions/comments, at the bottom, where you explain how your suggestion 
works.

On 29/05/12 18:20, Graham Klyne wrote:
> On 29/05/2012 12:19, Luc Moreau wrote:
>> Hi Graham
>>
>> Response sinterleaved.
>>
>> On 05/29/2012 10:53 AM, Graham Klyne wrote:
>>> OK, I'm looking more closely at the examples.
>>>
>>> In example 45, you introduce prov:service-uri as an attribute. If 
>>> the PROV-N
>>> is supposed to be a domain neutral way of representing provenance 
>>> information,
>>> I'm not sure that it's appropriate to introduce structures that are 
>>> specific
>>> to access mechanisms. To me, it feels like a layer violation and 
>>> scope creep
>>> in the purpose of PROV-N, as it starts to tie it to specific technology
>>> (PROV-AQ in this case).
>>>
>> I guess you mean PROV-DM and not the notation.
>
> I did, though for the purposes of this discussion they're 
> interchangeable.
>
>> But I believe the PROV-AQ introduces properties such as 
>> prov:hasProvenance.
>> If this is to be in the ontology, we also need it in other 
>> representations, and
>> therefore, in the conceptual model.
>
> I think that's a different issue.  prov:service-uri is a link relation 
> introduced by PROV-AQ for accessing a provenance *service*, soemthing 
> that I don't see as being in scope for description by PROV-DM or PROV-N.
>
> As for prov:hasProvenance, I think that's what your 
> hasprovenanceIn(...) is intended to reflect.  I accept the utility of 
> that, but am arguing against over-complicating it with additional 
> details which are operational artifacts of the web/linked data 
> deployment environment.
>
> (If it's causing problems, we could move prov:hasProvenance to a 
> different namespace, but that feels a bit overkill to me.)
>
>>> Example 46: looks sensible to me. I note that this makes perfect 
>>> sense without
>>> using the type=prov:Bundle annotation. I question whether we really 
>>> need this
>>> type annotation. (I think I commented on this in my review email to 
>>> you.)
>>
>> I still have to respond to this separately.
>> prov:Collection, prov:Plan, Prov:Dictionary, prov:Bundle are type 
>> information
>> that can be inferred I believe.
>
> OK.  I don't feel very strongly about this particular point, but as I 
> was on one of my simplification crusades, I thought I'd mention it...
>
>>>
>>> Example 47: The only justification I see in in this use-case is that 
>>> "Alice
>>> may have decided to use a different identifier for ex:report1". The
>>> implication is that there may be some name-aliasing permitted. If 
>>> this is a
>>> desired feature of PROV-N, I think it should be treated separately 
>>> as a first
>>> class feature, not snuck in an obscure feature of prov:hasProvenanceIn.
>>>
>>> Actually, I suspect that's not really what you meant. To make 
>>> further progress
>>> on this, I have to make an assumption (based on the appeal to 
>>> PROV-AQ). In
>>> PROV-AQ, a provenance link may also indicate a different target-URI 
>>> from the
>>> original request URI. The use case for this is when the actual resource
>>> returned is a specialization of the resource requested; e.g.
>>>
>>> C: GET http://example.org/weather-in-london/today
>>>
>>> S: 200 OK
>>> S: Link: <http://example.org/LWP-20120529>;
>>> rel=prov:Provenance;
>>> anchor=<http://example.org/weather-in-london/20120529>
>>>
>>> If this is indeed the kind of scenario you intend to support, then I 
>>> thimnk
>>> the proper way to address this in example 47 would be:
>>>
>>> Original (included for reference, and in case the document changed):
>>>
>>> bundle alice:bundle6
>>> entity(alice:report1)
>>> hasProvenanceIn(alice:report1, bob:bundle4, ex:report1)
>>> entity(ex:report2, [ prov:type="report", ex:version=2 ])
>>> wasGeneratedBy(ex:report2, -, 2012-05-25T11:00:01)
>>> wasDerivedFrom(ex:report2, alice:report1)
>>> endBundle
>>>
>>> Proposed revision:
>>>
>>> bundle alice:bundle6
>>> entity(alice:report1)
>>> hasProvenanceIn(alice:report1, bob:bundle4, -)
>>> specializationOf(alice:report1, ex:report1)
>>> entity(ex:report2, [ prov:type="report", ex:version=2 ])
>>> wasGeneratedBy(ex:report2, -, 2012-05-25T11:00:01)
>>> wasDerivedFrom(ex:report2, alice:report1)
>>> endBundle
>>>
>>
>> Did you really mean
>> hasProvenanceIn(alice:report1, bob:bundle4, -)
>> and not
>> hasProvenanceIn(ex:report1, bob:bundle4, -) ?
>>
>> This is equivalent to the suggestion that Simon made in the same 
>> thread (example
>> with tool rating agent performance).
>
> Yes, I saw that later, and I agreed with Simon.
>
> I meant to leave your hasProvenanceIn as it was, apart from dropping 
> the target URI.
>
>
>> The reason for introducing
>> hasProvenanceIn(alice:report1, bob:bundle4, ex:report1)
>> was that alice:report1 was a dead end in the current bundle.
>> I needed to know where to go next in my incremental navigation.
>>
>> By adding,
>> specializationOf(alice:report1, ex:report1)
>> alice:report1 is no longer the end of the graph, but ex:report1 is.
>> So how do I know where to go next?
>
> I think it can work either way, and I think either should be allowed.  
> You don't always have to get to the *end* of the chain before 
> accessing additional provenance.

Yes, agreed, we can always get additional provenance, without being at 
the end of the chain.

>
> In my approach, retrieve bob:bundle4 to access provenance about 
> alice:report1,

But how can you, since alice:report1 is not in bob:bundle4?


> *then* use the specializationOf information to infer that provenance 
> for ex:report1 is also true for alice:report1.
>

What kind of inference do you mean here?  None of that has been 
formalized, and if there is inference to perform,
we should formulate it.
> In your approach, use the specializationOf information to know infer 
> that retrieving provenance for ex:report1 will also provide 
> information for alice:report1, thus decide to retrieve information for 
> ex:report1.

I don't think that I do any inference.  I just use a mechanism to 
retrieve bundles (i.e. graphs), navigate within graphs,
and jump across graphs (using the proposed hasProvenanceIn relation, and 
target).  This is exactly incremental navigation
described in the PAQ and provided by some provenance stores (e.g. pasoa).

I think that it is a slippery slope to try and make inferences across 
bundles. We should keep that out of scope
of our specs.

>
>> > The thrust of what I'm saying here is that I think it's 
>> inappropriate for
>> PROV-N to just mimic PROV-AQ - they operate at different levels - but 
>> (if >using
>> PROV-AQ as a motivating guide) to represent the scenarios that might be
>> addressed using PROV-AQ capabilities.
>>
>> But if the information of where to go next has not been recorded 
>> somewhere in a
>> provenance repository,
>> how can I make use of PROV-AQ? I won't be able to indicate which
>> provenance-uri/target-uri/etc to use.
>> The asserter of provenance descriptions has got the possibility of 
>> recording
>> such a kind of information.
>
> See above for mechanics.
>
> I think it's misleading to think of the PROV-DM information as telling 
> you "where to go" - that's only the case when you are in the web 
> environment and can use follow-your-nose techniques.  What the PROV-DM 
> is telling you is "what to look for" - as in "this bundle contains 
> provenance information about this entity" - coupled with 
> specialization-based inferences like "any provenance that is true of 
> this entity is also true of that entity".
>
>>> This is what I meant when I posed the original question: "I would 
>>> like to
>>> understand what real scenario justifies all the added machinery that 
>>> has been
>>> included with this relation." Note the *real scenario* here.
>>>
>>>
>>
>>
>> See tool rating agent performance example in the same thread.
>
>
>
>>> Example 48:
>>>
>>> I'm not sure what useful purpose is served by:
>>>
>>> hasProvenanceIn(tool:r2, obs:bundle7, ex:report2)
>>>
>>> This says that provenance information about tool:r2 can be found in
>>> obs:bundle7 under the name ex:report2. But what *is* tool:r2 - all I 
>>> can see
>>> is a comment that it's a new identifier. I really can't figure what 
>>> going on
>>> here. What I really need to know is what is the relationship between 
>>> tool:r2
>>> and ex:report2.
>>
>> It's funny how we get conflicting messages here.
>> Some group members were suggesting it was NOT right to reuse the same 
>> identifier
>> ex:report2
>> to add the viz attributes. Now, are you suggesting that I shouldn't 
>> use a
>> different one?
>
> I wasn't proposing to re-use the same identifier.   What I was saying 
> is that if information about ex:report2 that also tells me about 
> tool:r2, then there should be an inference that makes this explicit.

I am struggling with this notion of inference. Why do you need 
inference, when
you could have an explicit relation
hasProvenanceIn(tool:r2, obs:bundle7, ex:report2) ?

>
> I couldn't clearly understand what your proposal was trying to do.  
> Treating it as an import mechanism was a guess, and I got it wrong.  
> This, to my mind, is indicative that the proposal is problematic.
>
>> The point is some users will mint new identifiers, others won't, and 
>> we need to
>> be able to support this.
>
> Sure, but if we do then I think the mechanism should be explicit, but 
> piggy-backed on a relation that serves a different purpose.  As it is, 
> it's just confusing, which can't be a Good Thing.

I find your proposal equally confusing.
I feel that it does not help with the problem of incremental graph 
traversal across bundles.

Luc

>
> #g
> -- 
>
>>>
>>> My best guess here is that you are using the aliasing to fake a kind 
>>> of bundle
>>> import mechanism. If that's what you are doing, and need, I think 
>>> that should
>>> be addressed as a separate, comprehensible feature. This all feels a 
>>> bit like
>>> Jensen's device (http://en.wikipedia.org/wiki/Jensen%27s_Device) - very
>>> clever, but ultimately obscure and unusable for almost all practical 
>>> purposes.
>>>
>>
>> The reference to Jensen's device is simply not relevant here.
>>
>> Nowhere I suggested this is an import mechanism, though some may 
>> choose to see
>> it as such, but
>> PROV will not say how to merge bundles.
>>
>>
>> Luc
>>
>>
>>> #g
>>> -- 
>>>
>>>
>>> On 28/05/2012 21:26, Luc Moreau wrote:
>>>> Hi Graham,
>>>>
>>>> Like PROV-AQ, we need a target.
>>>> Example 47 illustrates the need for it:
>>>>
>>>> hasProvenanceIn(alice:report1, bob:bundle4, ex:report1)
>>>>
>>>> In the current bundle, there is a description for alice:report1.
>>>> More provenance can be found for it in bundle bob:bundle4, under 
>>>> the name
>>>> ex:report1.
>>>>
>>>>
>>>> The presence of attributes and id follow the pattern of other 
>>>> qualified
>>>> relations.
>>>>
>>>> Luc
>>>>
>>>> On 28/05/12 20:01, Provenance Working Group Issue Tracker wrote:
>>>>> PROV-ISSUE-385 (haProvenanceIn-complexity): The hasProvenbanceIn 
>>>>> relation is
>>>>> over-complicated [prov-dm]
>>>>>
>>>>> http://www.w3.org/2011/prov/track/issues/385
>>>>>
>>>>> Raised by: Graham Klyne
>>>>> On product: prov-dm
>>>>>
>>>>> I'm raising this issue as a placeholder and for discussion. I 
>>>>> didn't notice
>>>>> the arrival of prov:hasProvenanceIn, but based on its appearance in
>>>>> http://dvcs.w3.org/hg/prov/raw-file/default/model/releases/ED-prov-dm-20120525/prov-dm.html 
>>>>>
>>>>>
>>>>> (which AFAIK is not a currently active draft, but a proposal) is 
>>>>> rather
>>>>> over-complicated and a bit obscure.
>>>>>
>>>>> My sense is that, especially as this is motivated by PROV-AQ, 
>>>>> there are just
>>>>> too many identifiers floating around.
>>>>>
>>>>> Instead of:
>>>>>
>>>>> hasProvenanceIn(id, subject, bundle, target, attrs)
>>>>>
>>>>> Why not just:
>>>>>
>>>>> hasProvenanceIn(subject, bundle)
>>>>>
>>>>> Where subject is based on the URI of an entity, and bundle is 
>>>>> based on the URI
>>>>> of a provenance bundle with information about that entity.
>>>>>
>>>>> I would like to understand what real scenario justifies all the added
>>>>> machinery that has been included with this relation.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
Received on Tuesday, 29 May 2012 21:58:13 UTC