Re: PROV-ISSUE-385 (haProvenanceIn-complexity): The hasProvenbanceIn relation is over-complicated [prov-dm] from Luc Moreau on 2012-05-31 (public-prov-wg@w3.org from May 2012)

From: Luc Moreau <L.Moreau@ecs.soton.ac.uk>
Date: Thu, 31 May 2012 13:13:37 +0100
To: Paul Groth <p.t.groth@vu.nl>
CC: Graham Klyne <Graham.Klyne@zoo.ox.ac.uk>, "public-prov-wg@w3.org" <public-prov-wg@w3.org>
Message-ID: <EMEW3|f74bd53286ce86ff0170d304a5df4a7fo4UDDf08L.Moreau|ecs.soton.ac.uk|4FC76071>
Hi Paul,

I don't think we can break up this ternary relation in two binary relations.

1. One of the use cases I suggested was:

    hasProvenanceIn(e1, b1, e)
    hasProvenanceIn(e2, b2, e)

    You would end up with

    hasProvenanceIn(e1, b1)
    alternate(e1, e)
    hasProvenanceIn(e2, b2)
    alternate(e2, e)

    But e1 is not in b1, it's e which is a topic in b1.
    Likewise for e2 and b2.  How do we know what to look for in b1?

    Furthermore, e itself may not be a topic at all in the current bundle.

    What in addition, I have another alternate relation
    alternate(e1, e3)?
    How do I know what is the alias for e1 in b1?


2. Another use case I suggested was:

    hasProvenanceIn(e, b1, e)    // provenance of e found in b1, no aliasing
    hasProvenanceIn(e2, b2, e)   // I had to choose alias e to e2 in the
    current bundle
                                     because I  specialize it differently


    hasProvenanceIn(e, b1)
    hasProvenanceIn(e2, b2)
    alternate(e2, e)

    Given alternate(e2, e) implies alternate(e, e2)

    We are faced with a similar problem, which one is aliased, which one
    is not?


3. Furthermore, I definitely would like to subtype this relation.
The only possiblity in the context of prov-dm is to allow for prov:type 
to be expressed.

  hasProvenanceIn(e1, b1, e, [ prov:type= XXX ])

4.  I do not know whether the following is a way out, but it may help.

If we consider that the third argument of

    hasProvenanceIn(e1, b1, e)

is an alias, then when we write "e", we really mean the identifier "e" 
and not the entity denoted by "e".
As opposed to "e1" which denotes an entity with a name.


So, an alternative is to write it as follows:

    hasProvenanceIn(e1, b1, [ prov:alias='e'])

We are stating that entity e1 has Provenance in b1/is a topic in b1, but 
we need to look for it under
the name 'e'.  The value associated with prov:alias must a URI or a 
qualified name.


Luc


On 05/31/2012 12:43 PM, Paul Groth wrote:
> Luc, Graham:
>
> I wonder if one can see
>
> hasProvenanceIn(entity, bundle, alias-for-entity)
>
> as a proxy for
>
> hasProvenanceIn(entity, bundle)
> alternate(entity, alias-for-entity)
>
> Is this a possible interpretation?
>
> Thanks
> Paul
>
>
> On Thu, May 31, 2012 at 11:09 AM, Graham Klyne
> <Graham.Klyne@zoo.ox.ac.uk>  wrote:
>    
>> On 30/05/2012 10:41, Luc Moreau wrote:
>>      
>>>>>> In my approach, retrieve bob:bundle4 to access provenance about alice:report1,
>>>>>>              
>>>>> But how can you, since alice:report1 is not in bob:bundle4?
>>>>>
>>>>>            
>>>>>> *then* use the specializationOf information to infer that provenance for
>>>>>> ex:report1 is also true for alice:report1.
>>>>>>              
>>>> OK, I should have said "retrieve bob:bundle4 to access provenance about
>>>> ex:report1" there. The rest stands.
>>>>          
>>> But again, how do you know where to find provenance for ex:report1.
>>>
>>> You seem to have
>>> hasProvenanceIn(alice:report1, bob:bundle4, -)
>>> specializationOf(alice:report1, ex:report1)
>>>
>>> This does not say which bundle I can find provenance for ex:report1.
>>>        
>> Maybe I was right first time... I've lost the context of this example.
>>
>> Moving on...
>>
>>      
>>>> What I think we should *not* do is add things to PROV-DM purely to support
>>>> operational concerns (like incremental discovery). That would be to have the
>>>> tail wagging the dog.
>>>>          
>>> I think we may put to much emphasis on 'incremental discovery' as per PAQ.
>>>
>>> The PROV data model in effect specifies a distributed graph structure
>>> (distributed across bundles I mean here, services are a side issue).
>>> To me, it is essential for the model to provide accurate linking across bundles
>>> so that the data structure can be navigated.
>>>        
>> I think this is a reasonable and appropriate way to frame the problem.
>>
>>      
>>> By accurate, I mean that I want to be able to link an entity in a bundle with
>>> another entity in another specific bundle.
>>>        
>> This seems to me to be a different requirement; viz "to link an entity ... with
>> *another* entity".  If that's a requirement, I think it should be orthogonal to
>> the cross-bundle linking.
>>
>> What I do not argue against is the construct:
>>
>>    hasProvenanceIn(entity, bundle)
>>
>> What I am questioning is the purpose of
>>
>>    hasProvenanceIn(entity, bundle, alias-for-entity)
>>
>> Why is the former construct alone insufficient to "provide accurate linking
>> across bundles so that the data structure can be navigated"?
>>
>> (AT this point, we probably need to revisit the examples, but I'm out of time
>> right now.)
>>
>>      
>>> Without it, bundles are effectively not usable, and very quickly we will see
>>> constructs like the one I suggest, to aid this navigation
>>> of the graph, and we will have failed in achieving interoperability.
>>>        
>> <aside>
>> "interoperability" is not a simple binary property.  No specification can
>> reasonably underpin total interoperability.  What a spec can do is provide a
>> basis for interoperability for a particular set of activities (scope).  Any
>> application will build upon a spec with additional elements (which may or may
>> not be interoperable with other applications).
>>
>> Of itself, by virtue of being technology neutral, PROV-DM *cannot* be regarded
>> as achieving interoperability - additional technology layers will always be
>> needed for that.  It's a fairly common problem in standards development to try
>> and "boil the ocean", rather than focus on documenting a clear consensus.  Each
>> standard is just part of a bigger ecosystem, and should focus (like good
>> software products) on achieving some clear goals really well and play well with
>> other components.
>>
>> None of this is arguing against the desirability of adequately describing a
>> "distributed graph structure" - I think that is a reasonable goal here - but not
>> because not doing so would mean "we will have failed in achieving
>> interoperability" - in these terms, we will always fail to achieve interoperability.
>> </aside>
>>
>> #g
>>
>>
>>
>>      
>
>    

-- 
Professor Luc Moreau
Electronics and Computer Science   tel:   +44 23 8059 4487
University of Southampton          fax:   +44 23 8059 2865
Southampton SO17 1BJ               email: l.moreau@ecs.soton.ac.uk
United Kingdom                     http://www.ecs.soton.ac.uk/~lavm
Received on Thursday, 31 May 2012 12:14:14 UTC