Re: about EntityInRole and assumedBy from Paul Groth on 2011-10-28 (public-prov-wg@w3.org from October 2011)

From: Paul Groth <p.t.groth@vu.nl>
Date: Fri, 28 Oct 2011 16:08:14 +0200
To: Simon Miles <simon.miles@kcl.ac.uk>
CC: Provenance Working Group WG <public-prov-wg@w3.org>
Message-ID: <4EAAB74E.1010802@vu.nl>
Hi Simon,

I think your approach would modify the data model pretty radically for 
not much gain syntactically.

In a publication activity, for example, there may be many publications 
or intermediate outputs that were generated. With your approach this 
forces me into breaking down that activity. Furthermore, it may be the 
case that I can document the publication activity but don't know how 
it's broken down because I'm just observing it.

Thanks,
Paul

Simon Miles wrote:
> Hi Daniel,
>
> That's a useful categorisation of approaches.
>
> While we are focusing on simplification of the specifications, I would
> like to re-advocate approach 3, which provides:
>   - simple assertions for basic knowledge (as Paul has been advocating),
>   - has properties rather than classes representing relationships, and
>   - maps nicely to the conceptual model.
>
> I don't find the arguments against option 3 to be strong ones. Most
> information that you might attach to a used/generated edge could be
> attached to a process or an entity if what has occurred is described
> at the right granularity. I would argue that if you find you want to
> annotate a used/generated event with, for example, location, this just
> means you need a finer-grained account where it is clear that the
> location was of a particular entity or a particular process. The only
> annotation I can see that makes sense for an event is a timestamp,
> with the time of generation being the critical one. However, I suggest
> approach 3 still allows this.
>
> My proposal:
>
> prov:used is an edge, domain is prov:ProcessExecution, range is prov:Entity
> prov:wasGeneratedBy is an edge, domain is prov:Entity, range is
> prov:ProcessExecution
>
> Asserting a role:
> :pe1 :usedConfigurationFile :entity1.
> :usedConfigurationFile rdfs:subPropertyOf prov:used.
>
> Note that, if we want to avoid the need for (explicit) inference, both
> statements need to be present in the provenance data and the pattern
> above needs to be interpreted as expressing a used role by queriers.
>
> prov:wasGeneratedAtEndOf is a sub-property of prov:wasGeneratedBy
> prov:usedAtStartOf is a sub-property of prov:used
>
> Asserting time of generation:
>   :entity2 prov:wasGeneratedAtEndOf :pe2.
>   :pe2 prov:endedAt :timestamp2.
>
> If the process you have asserted so far generates the entity partway
> through, then provide a finer-grained account of that process where
> the generation occurs at the end of a (finer grained) process. If you
> are able to say when an entity was generated during a process then you
> have enough information to decompose the process, i.e. you at least
> know there was part of the process before the entity's generation and
> part after.
>
> Thanks,
> Simon
>
> On 28 October 2011 12:22, Daniel Garijo<dgarijo@delicias.dia.fi.upm.es>  wrote:
>> Hi James,
>>
>> When we started this small task force, we looked for alternatives to solve
>> this issue. We had 3 possible approcahes:
>>
>> The OPMO approach modelling edges as n-ary relationships.
>> Satya's approach with EntityInRole.
>> The OPMV approach especializing roles as subproperties of the "use"
>> relationship.
>>
>> Approach 3) was dropped because despite it is very simple and direct, it is
>> not possible to add time and location to the edges
>> (all the properties are binary).
>> Approach 1) covered the functionality in my opinion (although Satya may have
>> some objections against it), but we decided to go
>> for approach 2) which now has brought some problems.
>>
>> Unfortunately I'm not aware of other ways to model n-ary relationships. If
>> someone comes up with a new approach on monday,
>> I'll be happy to discuss it.
>>
>> Best,
>> Daniel
>>
>> 2011/10/28 James Cheney<jcheney@inf.ed.ac.uk>
>>> Just to clarify, I believe the alternative I'd suggested (which Khalid
>>> gives in proper Turtle/RDF notation below) is similar to that in OPM-O,
>>> which had Used as a class rather than a property.  I'm not claiming to have
>>> come up with this on my own, I'm sure I had seen the OPM-O treatment and
>>> inadvertently reinvented it.
>>>
>>> My main question is, is that approach more palatable than the EntityInRole
>>> approach?  If not, what alternative would be a better fit for the current
>>> PROV-DM?
>>>
>>> --James
>>>
>>> On Oct 26, 2011, at 7:42 PM, Khalid Belhajjame wrote:
>>>
>>>> Hi Luc, all,
>>>>
>>>> We are aware of the problems that EntityInRole is introducing. Following
>>>> your comments in last emails, we are contemplating a new alternative, that
>>>> was suggested by James, which seems to be inline with the PROV-DM. Not all
>>>> members of the ontology group have expressed their opinion yet, though.
>>>>
>>>> The idea is to have two owl classes Usage and Generation. These two
>>>> classes are connected to Entity and Process Execution (using object
>>>> properties), and qualifiers, such as role, can be defined as object
>>>> properties of Usage and Generation. For example, Usage can be defined as:
>>>>
>>>> prov:Usage a owl:Class.
>>>>
>>>> prov:hasEntity
>>>>     a rdf:Property ;
>>>>     rdfs:domain prov:Usage ;
>>>>     rdfs:range prov:Entity .
>>>>
>>>> prov:hasProcessExecution
>>>>     a rdf:Property ;
>>>>     rdfs:domain prov:Usage ;
>>>>     rdfs:range prov:ProcessExecution .
>>>>
>>>> prov:hasRole
>>>>     a rdf:Property ;
>>>>     rdfs:domain prov:Usage ;
>>>>     rdfs:range prov:Role .
>>>> ...
>>>>
>>>> Additionally, to be closer to the model, we can define shortcuts
>>>> properties: used and wasGeneratedBy as object properties that connect
>>>> process execution and entities. For example,
>>>>
>>>> prov:used
>>>>     a rdf:Property ;
>>>>     rdfs:domain prov:ProcessExecution ;
>>>>     rdfs:range prov:Entity .
>>>>
>>>> There is at least one issue that we are aware of using this design, but
>>>> IMO, it is better.
>>>>
>>>> khalid
>>>>
>>>> On 26/10/2011 16:22, Luc Moreau wrote:
>>>>> Dear all,
>>>>>
>>>>> I would like to reiterate again my concern about the mismatch between
>>>>> prov-dm and prov-o. I think the two notions the documents diverge on
>>>>> are
>>>>> entityInRole (a subclass of Entity) and assumedBy (which in essence is
>>>>> a kind of specialization of wasComplementOf).
>>>>>
>>>>> They are introduced in the ontology to be able to express time and
>>>>> qualifiers
>>>>> (including role) for use and generation.
>>>>>
>>>>> It is suggested by JamesC and Graham that we write examples to
>>>>> understand the
>>>>> differences, it is a very good point that I support.
>>>>>
>>>>> I would like to explain the problem I have with the EntityInRole
>>>>> solution.
>>>>>
>>>>> Let's take the case of a used relation (ideas apply similarly to
>>>>> generation).
>>>>> Let's imagine there is no qualifier/time information.
>>>>> We want to be simple, and express a property, as prov-o does:
>>>>>
>>>>> Encoding1:
>>>>>   pe prov:used e1
>>>>>
>>>>> If we suddently have time information or role, according to prov-o, we
>>>>> would have to write:
>>>>>
>>>>> Encoding2:
>>>>>
>>>>> pe prov:used e1X
>>>>> e1X  prov:assumedBy e1
>>>>> e1X  prov:assumedAt t1
>>>>> e1X  prov:assumedRole r
>>>>>
>>>>> My problems are the following:
>>>>>
>>>>> - Encoding2 is not an extension of encoding1: it  does not just add new
>>>>> edges,
>>>>>   it removes some.
>>>>>   But according to the data model, we just have added extra information.
>>>>>
>>>>> - why should I change my "modelling of what happens in the world",
>>>>>   encoding2 has got two entities when encoding1 has got only one.
>>>>>
>>>>> - I believe it would be reasonable to write
>>>>>     e1X wasComplementOf e1  (since it looks like e1X has attributes that
>>>>> e1 doesn't have,
>>>>>     and have a common time interval).
>>>>>
>>>>>   What's the difference between e1X WasComplementOf e1 and e1X assumedBy
>>>>> e1?
>>>>>    wasComplementOf is hard enough, why do we have to have something so
>>>>> similar to it,
>>>>>    but restricted to entityInRole?
>>>>>
>>>>> - Imagine the scenario is more complex and e1 is used by a second
>>>>> process in another
>>>>>   role, at time t2>t1 so we may have another entity e2X.
>>>>>   It would also be reasonable to write e2X wasDerivedFrom e1X because
>>>>> e2X follows e1X.
>>>>>   But this wouldn't be possible in prov-dm, unless we explictly
>>>>> introduce e1X and e2X.
>>>>>
>>>>>
>>>>> Cheers,
>>>>> Luc
>>>>>
>>>>>
>>>>
>>> --
>>> The University of Edinburgh is a charitable body, registered in
>>> Scotland, with registration number SC005336.
>>>
>>>
>
>
>

-- 
Dr. Paul Groth (p.t.groth@vu.nl)
http://www.few.vu.nl/~pgroth
Assistant Professor
Knowledge Representation & Reasoning Group
Artificial Intelligence Section
Department of Computer Science
VU University Amsterdam
Received on Friday, 28 October 2011 14:11:13 UTC