Re: PROV-ISSUE-385 (haProvenanceIn-complexity): The hasProvenbanceIn relation is over-complicated [prov-dm] from Jim McCusker on 2012-05-31 (public-prov-wg@w3.org from May 2012)

From: Jim McCusker <mccusj@rpi.edu>
Date: Thu, 31 May 2012 11:18:55 -0400
To: Luc Moreau <L.Moreau@ecs.soton.ac.uk>
Cc: Paul Groth <p.t.groth@vu.nl>, Provenance Working Group WG <public-prov-wg@w3.org>
Message-ID: <CAAtgn=SKUMSHaJwX3jF-MYNX8_fJELGBMibsbuyYcd5zvNvVhQ@mail.gmail.com>
It could be that whether it's alternate or specialization is best left to
the person who is setting the data, further justifying breaking it out.

Jim

On Thu, May 31, 2012 at 10:03 AM, Luc Moreau <L.Moreau@ecs.soton.ac.uk>wrote:

> Hi Paul,
> I suppose you meant this to be public.
>
> Yes, I think so. I am not sure whether it's alternate or whether it's
> specialization.
>
> Luc
>
> On 05/31/2012 02:50 PM, Paul Groth wrote:
>
>> would an interpretation be that the alias is an alternate in that bundle?
>>
>> cheers
>> Paul
>>
>> On Thu, May 31, 2012 at 4:26 PM, Luc Moreau<L.Moreau@ecs.soton.ac.**uk<L.Moreau@ecs.soton.ac.uk>>
>>  wrote:
>>
>>
>>> Hi Paul,
>>> Yes, it's optional.  When absent, the alias is the subject identifier.
>>> Luc
>>>
>>>
>>> On 05/31/2012 02:15 PM, Paul Groth wrote:
>>>
>>>
>>>> Hi Luc,
>>>>
>>>> Thanks for the explanations.
>>>>
>>>> I was wondering if the third argument is optional or not?
>>>>
>>>> Thanks
>>>> Paul
>>>>
>>>> On Thu, May 31, 2012 at 3:13 PM, Luc Moreau<L.Moreau@ecs.soton.ac.**uk<L.Moreau@ecs.soton.ac.uk>>
>>>>    wrote:
>>>>
>>>>
>>>>
>>>>> Hi Paul,
>>>>>
>>>>> I don't think we can break up this ternary relation in two binary
>>>>> relations.
>>>>>
>>>>> 1. One of the use cases I suggested was:
>>>>>
>>>>> hasProvenanceIn(e1, b1, e)
>>>>> hasProvenanceIn(e2, b2, e)
>>>>>
>>>>> You would end up with
>>>>>
>>>>> hasProvenanceIn(e1, b1)
>>>>> alternate(e1, e)
>>>>> hasProvenanceIn(e2, b2)
>>>>> alternate(e2, e)
>>>>>
>>>>> But e1 is not in b1, it's e which is a topic in b1.
>>>>> Likewise for e2 and b2.  How do we know what to look for in b1?
>>>>>
>>>>> Furthermore, e itself may not be a topic at all in the current bundle.
>>>>>
>>>>> What in addition, I have another alternate relation
>>>>> alternate(e1, e3)?
>>>>> How do I know what is the alias for e1 in b1?
>>>>>
>>>>>
>>>>> 2. Another use case I suggested was:
>>>>>
>>>>> hasProvenanceIn(e, b1, e)    // provenance of e found in b1, no
>>>>> aliasing
>>>>> hasProvenanceIn(e2, b2, e)   // I had to choose alias e to e2 in the
>>>>> current
>>>>> bundle
>>>>>                                  because I  specialize it differently
>>>>>
>>>>>
>>>>> hasProvenanceIn(e, b1)
>>>>> hasProvenanceIn(e2, b2)
>>>>> alternate(e2, e)
>>>>>
>>>>> Given alternate(e2, e) implies alternate(e, e2)
>>>>>
>>>>> We are faced with a similar problem, which one is aliased, which one
>>>>> is not?
>>>>>
>>>>>
>>>>> 3. Furthermore, I definitely would like to subtype this relation.
>>>>> The only possiblity in the context of prov-dm is to allow for
>>>>> prov:type to
>>>>> be expressed.
>>>>>
>>>>>   hasProvenanceIn(e1, b1, e, [ prov:type= XXX ])
>>>>>
>>>>> 4.  I do not know whether the following is a way out, but it may help.
>>>>>
>>>>> If we consider that the third argument of
>>>>>
>>>>>     hasProvenanceIn(e1, b1, e)
>>>>>
>>>>> is an alias, then when we write "e", we really mean the identifier "e"
>>>>> and
>>>>> not the entity denoted by "e".
>>>>> As opposed to "e1" which denotes an entity with a name.
>>>>>
>>>>>
>>>>> So, an alternative is to write it as follows:
>>>>>
>>>>>     hasProvenanceIn(e1, b1, [ prov:alias='e'])
>>>>>
>>>>> We are stating that entity e1 has Provenance in b1/is a topic in b1,
>>>>> but we
>>>>> need to look for it under
>>>>> the name 'e'.  The value associated with prov:alias must a URI or a
>>>>> qualified name.
>>>>>
>>>>>
>>>>> Luc
>>>>>
>>>>>
>>>>>
>>>>> On 05/31/2012 12:43 PM, Paul Groth wrote:
>>>>>
>>>>> Luc, Graham:
>>>>>
>>>>> I wonder if one can see
>>>>>
>>>>> hasProvenanceIn(entity, bundle, alias-for-entity)
>>>>>
>>>>> as a proxy for
>>>>>
>>>>> hasProvenanceIn(entity, bundle)
>>>>> alternate(entity, alias-for-entity)
>>>>>
>>>>> Is this a possible interpretation?
>>>>>
>>>>> Thanks
>>>>> Paul
>>>>>
>>>>>
>>>>> On Thu, May 31, 2012 at 11:09 AM, Graham Klyne
>>>>> <Graham.Klyne@zoo.ox.ac.uk>    wrote:
>>>>>
>>>>>
>>>>> On 30/05/2012 10:41, Luc Moreau wrote:
>>>>>
>>>>>
>>>>> In my approach, retrieve bob:bundle4 to access provenance about
>>>>> alice:report1,
>>>>>
>>>>>
>>>>> But how can you, since alice:report1 is not in bob:bundle4?
>>>>>
>>>>>
>>>>>
>>>>> *then* use the specializationOf information to infer that provenance
>>>>> for
>>>>> ex:report1 is also true for alice:report1.
>>>>>
>>>>>
>>>>> OK, I should have said "retrieve bob:bundle4 to access provenance about
>>>>> ex:report1" there. The rest stands.
>>>>>
>>>>>
>>>>> But again, how do you know where to find provenance for ex:report1.
>>>>>
>>>>> You seem to have
>>>>> hasProvenanceIn(alice:report1, bob:bundle4, -)
>>>>> specializationOf(alice:**report1, ex:report1)
>>>>>
>>>>> This does not say which bundle I can find provenance for ex:report1.
>>>>>
>>>>>
>>>>> Maybe I was right first time... I've lost the context of this example.
>>>>>
>>>>> Moving on...
>>>>>
>>>>>
>>>>>
>>>>> What I think we should *not* do is add things to PROV-DM purely to
>>>>> support
>>>>> operational concerns (like incremental discovery). That would be to
>>>>> have the
>>>>> tail wagging the dog.
>>>>>
>>>>>
>>>>> I think we may put to much emphasis on 'incremental discovery' as per
>>>>> PAQ.
>>>>>
>>>>> The PROV data model in effect specifies a distributed graph structure
>>>>> (distributed across bundles I mean here, services are a side issue).
>>>>> To me, it is essential for the model to provide accurate linking across
>>>>> bundles
>>>>> so that the data structure can be navigated.
>>>>>
>>>>>
>>>>> I think this is a reasonable and appropriate way to frame the problem.
>>>>>
>>>>>
>>>>>
>>>>> By accurate, I mean that I want to be able to link an entity in a
>>>>> bundle
>>>>> with
>>>>> another entity in another specific bundle.
>>>>>
>>>>>
>>>>> This seems to me to be a different requirement; viz "to link an entity
>>>>> ...
>>>>> with
>>>>> *another* entity".  If that's a requirement, I think it should be
>>>>> orthogonal
>>>>> to
>>>>> the cross-bundle linking.
>>>>>
>>>>> What I do not argue against is the construct:
>>>>>
>>>>>    hasProvenanceIn(entity, bundle)
>>>>>
>>>>> What I am questioning is the purpose of
>>>>>
>>>>>    hasProvenanceIn(entity, bundle, alias-for-entity)
>>>>>
>>>>> Why is the former construct alone insufficient to "provide accurate
>>>>> linking
>>>>> across bundles so that the data structure can be navigated"?
>>>>>
>>>>> (AT this point, we probably need to revisit the examples, but I'm out
>>>>> of
>>>>> time
>>>>> right now.)
>>>>>
>>>>>
>>>>>
>>>>> Without it, bundles are effectively not usable, and very quickly we
>>>>> will see
>>>>> constructs like the one I suggest, to aid this navigation
>>>>> of the graph, and we will have failed in achieving interoperability.
>>>>>
>>>>>
>>>>> <aside>
>>>>> "interoperability" is not a simple binary property.  No specification
>>>>> can
>>>>> reasonably underpin total interoperability.  What a spec can do is
>>>>> provide a
>>>>> basis for interoperability for a particular set of activities (scope).
>>>>>  Any
>>>>> application will build upon a spec with additional elements (which may
>>>>> or
>>>>> may
>>>>> not be interoperable with other applications).
>>>>>
>>>>> Of itself, by virtue of being technology neutral, PROV-DM *cannot* be
>>>>> regarded
>>>>> as achieving interoperability - additional technology layers will
>>>>> always be
>>>>> needed for that.  It's a fairly common problem in standards
>>>>> development to
>>>>> try
>>>>> and "boil the ocean", rather than focus on documenting a clear
>>>>> consensus.
>>>>>   Each
>>>>> standard is just part of a bigger ecosystem, and should focus (like
>>>>> good
>>>>> software products) on achieving some clear goals really well and play
>>>>> well
>>>>> with
>>>>> other components.
>>>>>
>>>>> None of this is arguing against the desirability of adequately
>>>>> describing a
>>>>> "distributed graph structure" - I think that is a reasonable goal here
>>>>> - but
>>>>> not
>>>>> because not doing so would mean "we will have failed in achieving
>>>>> interoperability" - in these terms, we will always fail to achieve
>>>>> interoperability.
>>>>> </aside>
>>>>>
>>>>> #g
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Professor Luc Moreau
>>>>> Electronics and Computer Science   tel:   +44 23 8059 4487
>>>>> University of Southampton          fax:   +44 23 8059 2865
>>>>> Southampton SO17 1BJ               email: l.moreau@ecs.soton.ac.uk
>>>>> United Kingdom                     http://www.ecs.soton.ac.uk/~**lavm<http://www.ecs.soton.ac.uk/~lavm>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>> --
>>> Professor Luc Moreau
>>> Electronics and Computer Science   tel:   +44 23 8059 4487
>>> University of Southampton          fax:   +44 23 8059 2865
>>> Southampton SO17 1BJ               email: l.moreau@ecs.soton.ac.uk
>>> United Kingdom                     http://www.ecs.soton.ac.uk/~**lavm<http://www.ecs.soton.ac.uk/~lavm>
>>>
>>>
>>>
>>
>>
>>
>>
>
> --
> Professor Luc Moreau
> Electronics and Computer Science   tel:   +44 23 8059 4487
> University of Southampton          fax:   +44 23 8059 2865
> Southampton SO17 1BJ               email: l.moreau@ecs.soton.ac.uk
> United Kingdom                     http://www.ecs.soton.ac.uk/~**lavm<http://www.ecs.soton.ac.uk/~lavm>
>
>
>


-- 
Jim McCusker
Programmer Analyst
Krauthammer Lab, Pathology Informatics
Yale School of Medicine
james.mccusker@yale.edu | (203) 785-6330
http://krauthammerlab.med.yale.edu

PhD Student
Tetherless World Constellation
Rensselaer Polytechnic Institute
mccusj@cs.rpi.edu
http://tw.rpi.edu
Received on Thursday, 31 May 2012 15:19:53 UTC