Re: PROV-ISSUE-385 (haProvenanceIn-complexity): The hasProvenbanceIn relation is over-complicated [prov-dm] from Graham Klyne on 2012-05-29 (public-prov-wg@w3.org from May 2012)

From: Graham Klyne <graham.klyne@zoo.ox.ac.uk>
Date: Tue, 29 May 2012 18:20:31 +0100
To: Luc Moreau <L.Moreau@ecs.soton.ac.uk>
CC: public-prov-wg@w3.org
Message-ID: <4FC5055F.20608@zoo.ox.ac.uk>
On 29/05/2012 12:19, Luc Moreau wrote:
> Hi Graham
>
> Response sinterleaved.
>
> On 05/29/2012 10:53 AM, Graham Klyne wrote:
>> OK, I'm looking more closely at the examples.
>>
>> In example 45, you introduce prov:service-uri as an attribute. If the PROV-N
>> is supposed to be a domain neutral way of representing provenance information,
>> I'm not sure that it's appropriate to introduce structures that are specific
>> to access mechanisms. To me, it feels like a layer violation and scope creep
>> in the purpose of PROV-N, as it starts to tie it to specific technology
>> (PROV-AQ in this case).
>>
> I guess you mean PROV-DM and not the notation.

I did, though for the purposes of this discussion they're interchangeable.

> But I believe the PROV-AQ introduces properties such as prov:hasProvenance.
> If this is to be in the ontology, we also need it in other representations, and
> therefore, in the conceptual model.

I think that's a different issue.  prov:service-uri is a link relation 
introduced by PROV-AQ for accessing a provenance *service*, soemthing that I 
don't see as being in scope for description by PROV-DM or PROV-N.

As for prov:hasProvenance, I think that's what your hasprovenanceIn(...) is 
intended to reflect.  I accept the utility of that, but am arguing against 
over-complicating it with additional details which are operational artifacts of 
the web/linked data deployment environment.

(If it's causing problems, we could move prov:hasProvenance to a different 
namespace, but that feels a bit overkill to me.)

>> Example 46: looks sensible to me. I note that this makes perfect sense without
>> using the type=prov:Bundle annotation. I question whether we really need this
>> type annotation. (I think I commented on this in my review email to you.)
>
> I still have to respond to this separately.
> prov:Collection, prov:Plan, Prov:Dictionary, prov:Bundle are type information
> that can be inferred I believe.

OK.  I don't feel very strongly about this particular point, but as I was on one 
of my simplification crusades, I thought I'd mention it...

>>
>> Example 47: The only justification I see in in this use-case is that "Alice
>> may have decided to use a different identifier for ex:report1". The
>> implication is that there may be some name-aliasing permitted. If this is a
>> desired feature of PROV-N, I think it should be treated separately as a first
>> class feature, not snuck in an obscure feature of prov:hasProvenanceIn.
>>
>> Actually, I suspect that's not really what you meant. To make further progress
>> on this, I have to make an assumption (based on the appeal to PROV-AQ). In
>> PROV-AQ, a provenance link may also indicate a different target-URI from the
>> original request URI. The use case for this is when the actual resource
>> returned is a specialization of the resource requested; e.g.
>>
>> C: GET http://example.org/weather-in-london/today
>>
>> S: 200 OK
>> S: Link: <http://example.org/LWP-20120529>;
>> rel=prov:Provenance;
>> anchor=<http://example.org/weather-in-london/20120529>
>>
>> If this is indeed the kind of scenario you intend to support, then I thimnk
>> the proper way to address this in example 47 would be:
>>
>> Original (included for reference, and in case the document changed):
>>
>> bundle alice:bundle6
>> entity(alice:report1)
>> hasProvenanceIn(alice:report1, bob:bundle4, ex:report1)
>> entity(ex:report2, [ prov:type="report", ex:version=2 ])
>> wasGeneratedBy(ex:report2, -, 2012-05-25T11:00:01)
>> wasDerivedFrom(ex:report2, alice:report1)
>> endBundle
>>
>> Proposed revision:
>>
>> bundle alice:bundle6
>> entity(alice:report1)
>> hasProvenanceIn(alice:report1, bob:bundle4, -)
>> specializationOf(alice:report1, ex:report1)
>> entity(ex:report2, [ prov:type="report", ex:version=2 ])
>> wasGeneratedBy(ex:report2, -, 2012-05-25T11:00:01)
>> wasDerivedFrom(ex:report2, alice:report1)
>> endBundle
>>
>
> Did you really mean
> hasProvenanceIn(alice:report1, bob:bundle4, -)
> and not
> hasProvenanceIn(ex:report1, bob:bundle4, -) ?
>
> This is equivalent to the suggestion that Simon made in the same thread (example
> with tool rating agent performance).

Yes, I saw that later, and I agreed with Simon.

I meant to leave your hasProvenanceIn as it was, apart from dropping the target URI.


> The reason for introducing
> hasProvenanceIn(alice:report1, bob:bundle4, ex:report1)
> was that alice:report1 was a dead end in the current bundle.
> I needed to know where to go next in my incremental navigation.
>
> By adding,
> specializationOf(alice:report1, ex:report1)
> alice:report1 is no longer the end of the graph, but ex:report1 is.
> So how do I know where to go next?

I think it can work either way, and I think either should be allowed.  You don't 
always have to get to the *end* of the chain before accessing additional provenance.

In my approach, retrieve bob:bundle4 to access provenance about alice:report1, 
*then* use the specializationOf information to infer that provenance for 
ex:report1 is also true for alice:report1.

In your approach, use the specializationOf information to know infer that 
retrieving provenance for ex:report1 will also provide information for 
alice:report1, thus decide to retrieve information for ex:report1.

>  > The thrust of what I'm saying here is that I think it's inappropriate for
> PROV-N to just mimic PROV-AQ - they operate at different levels - but (if >using
> PROV-AQ as a motivating guide) to represent the scenarios that might be
> addressed using PROV-AQ capabilities.
>
> But if the information of where to go next has not been recorded somewhere in a
> provenance repository,
> how can I make use of PROV-AQ? I won't be able to indicate which
> provenance-uri/target-uri/etc to use.
> The asserter of provenance descriptions has got the possibility of recording
> such a kind of information.

See above for mechanics.

I think it's misleading to think of the PROV-DM information as telling you 
"where to go" - that's only the case when you are in the web environment and can 
use follow-your-nose techniques.  What the PROV-DM is telling you is "what to 
look for" - as in "this bundle contains provenance information about this 
entity" - coupled with specialization-based inferences like "any provenance that 
is true of this entity is also true of that entity".

>> This is what I meant when I posed the original question: "I would like to
>> understand what real scenario justifies all the added machinery that has been
>> included with this relation." Note the *real scenario* here.
>>
>>
>
>
> See tool rating agent performance example in the same thread.



>> Example 48:
>>
>> I'm not sure what useful purpose is served by:
>>
>> hasProvenanceIn(tool:r2, obs:bundle7, ex:report2)
>>
>> This says that provenance information about tool:r2 can be found in
>> obs:bundle7 under the name ex:report2. But what *is* tool:r2 - all I can see
>> is a comment that it's a new identifier. I really can't figure what going on
>> here. What I really need to know is what is the relationship between tool:r2
>> and ex:report2.
>
> It's funny how we get conflicting messages here.
> Some group members were suggesting it was NOT right to reuse the same identifier
> ex:report2
> to add the viz attributes. Now, are you suggesting that I shouldn't use a
> different one?

I wasn't proposing to re-use the same identifier.   What I was saying is that if 
information about ex:report2 that also tells me about tool:r2, then there should 
be an inference that makes this explicit.

I couldn't clearly understand what your proposal was trying to do.  Treating it 
as an import mechanism was a guess, and I got it wrong.  This, to my mind, is 
indicative that the proposal is problematic.

> The point is some users will mint new identifiers, others won't, and we need to
> be able to support this.

Sure, but if we do then I think the mechanism should be explicit, but 
piggy-backed on a relation that serves a different purpose.  As it is, it's just 
confusing, which can't be a Good Thing.

#g
--

>>
>> My best guess here is that you are using the aliasing to fake a kind of bundle
>> import mechanism. If that's what you are doing, and need, I think that should
>> be addressed as a separate, comprehensible feature. This all feels a bit like
>> Jensen's device (http://en.wikipedia.org/wiki/Jensen%27s_Device) - very
>> clever, but ultimately obscure and unusable for almost all practical purposes.
>>
>
> The reference to Jensen's device is simply not relevant here.
>
> Nowhere I suggested this is an import mechanism, though some may choose to see
> it as such, but
> PROV will not say how to merge bundles.
>
>
> Luc
>
>
>> #g
>> --
>>
>>
>> On 28/05/2012 21:26, Luc Moreau wrote:
>>> Hi Graham,
>>>
>>> Like PROV-AQ, we need a target.
>>> Example 47 illustrates the need for it:
>>>
>>> hasProvenanceIn(alice:report1, bob:bundle4, ex:report1)
>>>
>>> In the current bundle, there is a description for alice:report1.
>>> More provenance can be found for it in bundle bob:bundle4, under the name
>>> ex:report1.
>>>
>>>
>>> The presence of attributes and id follow the pattern of other qualified
>>> relations.
>>>
>>> Luc
>>>
>>> On 28/05/12 20:01, Provenance Working Group Issue Tracker wrote:
>>>> PROV-ISSUE-385 (haProvenanceIn-complexity): The hasProvenbanceIn relation is
>>>> over-complicated [prov-dm]
>>>>
>>>> http://www.w3.org/2011/prov/track/issues/385
>>>>
>>>> Raised by: Graham Klyne
>>>> On product: prov-dm
>>>>
>>>> I'm raising this issue as a placeholder and for discussion. I didn't notice
>>>> the arrival of prov:hasProvenanceIn, but based on its appearance in
>>>> http://dvcs.w3.org/hg/prov/raw-file/default/model/releases/ED-prov-dm-20120525/prov-dm.html
>>>>
>>>> (which AFAIK is not a currently active draft, but a proposal) is rather
>>>> over-complicated and a bit obscure.
>>>>
>>>> My sense is that, especially as this is motivated by PROV-AQ, there are just
>>>> too many identifiers floating around.
>>>>
>>>> Instead of:
>>>>
>>>> hasProvenanceIn(id, subject, bundle, target, attrs)
>>>>
>>>> Why not just:
>>>>
>>>> hasProvenanceIn(subject, bundle)
>>>>
>>>> Where subject is based on the URI of an entity, and bundle is based on the URI
>>>> of a provenance bundle with information about that entity.
>>>>
>>>> I would like to understand what real scenario justifies all the added
>>>> machinery that has been included with this relation.
>>>>
>>>>
>>>>
>>>>
>>>
>>
>
Received on Tuesday, 29 May 2012 17:34:23 UTC