Re: about EntityInRole and assumedBy from Paolo Missier on 2011-10-28 (public-prov-wg@w3.org from October 2011)

From: Paolo Missier <Paolo.Missier@ncl.ac.uk>
Date: Fri, 28 Oct 2011 15:57:40 +0100
To: public-prov-wg@w3.org
Message-ID: <4EAAC2E4.8060603@ncl.ac.uk>
Simon

my comment is probably similar to Paul's: I see a bit of an issue with multiple data products of a pe, which now are assumed to all 
occur at the end of the activity (we know this is not the case in general) - as reflected in the terminology:
   "prov:wasGeneratedAtEndOf" and "prov:endedAt"
Similarly for "used".
I guess you could actually associate a generation timestamp directly to en entity (because it's only generated once), but that 
wouldn't work for usage.

Having to introduce explicit accounts to indicate that a product appears before the end of an activity (i.e. by splitting it) is 'a' 
solution, but a bit of a heavyweight IMO

Overall, (2) would still work better for me

-Paolo



On 10/28/11 1:26 PM, Simon Miles wrote:
> Hi Daniel,
>
> That's a useful categorisation of approaches.
>
> While we are focusing on simplification of the specifications, I would
> like to re-advocate approach 3, which provides:
>   - simple assertions for basic knowledge (as Paul has been advocating),
>   - has properties rather than classes representing relationships, and
>   - maps nicely to the conceptual model.
>
> I don't find the arguments against option 3 to be strong ones. Most
> information that you might attach to a used/generated edge could be
> attached to a process or an entity if what has occurred is described
> at the right granularity. I would argue that if you find you want to
> annotate a used/generated event with, for example, location, this just
> means you need a finer-grained account where it is clear that the
> location was of a particular entity or a particular process. The only
> annotation I can see that makes sense for an event is a timestamp,
> with the time of generation being the critical one. However, I suggest
> approach 3 still allows this.
>
> My proposal:
>
> prov:used is an edge, domain is prov:ProcessExecution, range is prov:Entity
> prov:wasGeneratedBy is an edge, domain is prov:Entity, range is
> prov:ProcessExecution
>
> Asserting a role:
> :pe1 :usedConfigurationFile :entity1.
> :usedConfigurationFile rdfs:subPropertyOf prov:used.
>
> Note that, if we want to avoid the need for (explicit) inference, both
> statements need to be present in the provenance data and the pattern
> above needs to be interpreted as expressing a used role by queriers.
>
> prov:wasGeneratedAtEndOf is a sub-property of prov:wasGeneratedBy
> prov:usedAtStartOf is a sub-property of prov:used
>
> Asserting time of generation:
>   :entity2 prov:wasGeneratedAtEndOf :pe2.
>   :pe2 prov:endedAt :timestamp2.
>
> If the process you have asserted so far generates the entity partway
> through, then provide a finer-grained account of that process where
> the generation occurs at the end of a (finer grained) process. If you
> are able to say when an entity was generated during a process then you
> have enough information to decompose the process, i.e. you at least
> know there was part of the process before the entity's generation and
> part after.
>
> Thanks,
> Simon
>
> On 28 October 2011 12:22, Daniel Garijo<dgarijo@delicias.dia.fi.upm.es>  wrote:
>> Hi James,
>>
>> When we started this small task force, we looked for alternatives to solve
>> this issue. We had 3 possible approcahes:
>>
>> The OPMO approach modelling edges as n-ary relationships.
>> Satya's approach with EntityInRole.
>> The OPMV approach especializing roles as subproperties of the "use"
>> relationship.
>>
>> Approach 3) was dropped because despite it is very simple and direct, it is
>> not possible to add time and location to the edges
>> (all the properties are binary).
>> Approach 1) covered the functionality in my opinion (although Satya may have
>> some objections against it), but we decided to go
>> for approach 2) which now has brought some problems.
>>
>> Unfortunately I'm not aware of other ways to model n-ary relationships. If
>> someone comes up with a new approach on monday,
>> I'll be happy to discuss it.
>>
>> Best,
>> Daniel
>>
>> 2011/10/28 James Cheney<jcheney@inf.ed.ac.uk>
>>> Just to clarify, I believe the alternative I'd suggested (which Khalid
>>> gives in proper Turtle/RDF notation below) is similar to that in OPM-O,
>>> which had Used as a class rather than a property.  I'm not claiming to have
>>> come up with this on my own, I'm sure I had seen the OPM-O treatment and
>>> inadvertently reinvented it.
>>>
>>> My main question is, is that approach more palatable than the EntityInRole
>>> approach?  If not, what alternative would be a better fit for the current
>>> PROV-DM?
>>>
>>> --James
>>>
>>> On Oct 26, 2011, at 7:42 PM, Khalid Belhajjame wrote:
>>>
>>>> Hi Luc, all,
>>>>
>>>> We are aware of the problems that EntityInRole is introducing. Following
>>>> your comments in last emails, we are contemplating a new alternative, that
>>>> was suggested by James, which seems to be inline with the PROV-DM. Not all
>>>> members of the ontology group have expressed their opinion yet, though.
>>>>
>>>> The idea is to have two owl classes Usage and Generation. These two
>>>> classes are connected to Entity and Process Execution (using object
>>>> properties), and qualifiers, such as role, can be defined as object
>>>> properties of Usage and Generation. For example, Usage can be defined as:
>>>>
>>>> prov:Usage a owl:Class.
>>>>
>>>> prov:hasEntity
>>>>     a rdf:Property ;
>>>>     rdfs:domain prov:Usage ;
>>>>     rdfs:range prov:Entity .
>>>>
>>>> prov:hasProcessExecution
>>>>     a rdf:Property ;
>>>>     rdfs:domain prov:Usage ;
>>>>     rdfs:range prov:ProcessExecution .
>>>>
>>>> prov:hasRole
>>>>     a rdf:Property ;
>>>>     rdfs:domain prov:Usage ;
>>>>     rdfs:range prov:Role .
>>>> ...
>>>>
>>>> Additionally, to be closer to the model, we can define shortcuts
>>>> properties: used and wasGeneratedBy as object properties that connect
>>>> process execution and entities. For example,
>>>>
>>>> prov:used
>>>>     a rdf:Property ;
>>>>     rdfs:domain prov:ProcessExecution ;
>>>>     rdfs:range prov:Entity .
>>>>
>>>> There is at least one issue that we are aware of using this design, but
>>>> IMO, it is better.
>>>>
>>>> khalid
>>>>
>>>> On 26/10/2011 16:22, Luc Moreau wrote:
>>>>> Dear all,
>>>>>
>>>>> I would like to reiterate again my concern about the mismatch between
>>>>> prov-dm and prov-o. I think the two notions the documents diverge on
>>>>> are
>>>>> entityInRole (a subclass of Entity) and assumedBy (which in essence is
>>>>> a kind of specialization of wasComplementOf).
>>>>>
>>>>> They are introduced in the ontology to be able to express time and
>>>>> qualifiers
>>>>> (including role) for use and generation.
>>>>>
>>>>> It is suggested by JamesC and Graham that we write examples to
>>>>> understand the
>>>>> differences, it is a very good point that I support.
>>>>>
>>>>> I would like to explain the problem I have with the EntityInRole
>>>>> solution.
>>>>>
>>>>> Let's take the case of a used relation (ideas apply similarly to
>>>>> generation).
>>>>> Let's imagine there is no qualifier/time information.
>>>>> We want to be simple, and express a property, as prov-o does:
>>>>>
>>>>> Encoding1:
>>>>>   pe prov:used e1
>>>>>
>>>>> If we suddently have time information or role, according to prov-o, we
>>>>> would have to write:
>>>>>
>>>>> Encoding2:
>>>>>
>>>>> pe prov:used e1X
>>>>> e1X  prov:assumedBy e1
>>>>> e1X  prov:assumedAt t1
>>>>> e1X  prov:assumedRole r
>>>>>
>>>>> My problems are the following:
>>>>>
>>>>> - Encoding2 is not an extension of encoding1: it  does not just add new
>>>>> edges,
>>>>>   it removes some.
>>>>>   But according to the data model, we just have added extra information.
>>>>>
>>>>> - why should I change my "modelling of what happens in the world",
>>>>>   encoding2 has got two entities when encoding1 has got only one.
>>>>>
>>>>> - I believe it would be reasonable to write
>>>>>     e1X wasComplementOf e1  (since it looks like e1X has attributes that
>>>>> e1 doesn't have,
>>>>>     and have a common time interval).
>>>>>
>>>>>   What's the difference between e1X WasComplementOf e1 and e1X assumedBy
>>>>> e1?
>>>>>    wasComplementOf is hard enough, why do we have to have something so
>>>>> similar to it,
>>>>>    but restricted to entityInRole?
>>>>>
>>>>> - Imagine the scenario is more complex and e1 is used by a second
>>>>> process in another
>>>>>   role, at time t2>t1 so we may have another entity e2X.
>>>>>   It would also be reasonable to write e2X wasDerivedFrom e1X because
>>>>> e2X follows e1X.
>>>>>   But this wouldn't be possible in prov-dm, unless we explictly
>>>>> introduce e1X and e2X.
>>>>>
>>>>>
>>>>> Cheers,
>>>>> Luc
>>>>>
>>>>>
>>>>
>>>>
>>>
>>> --
>>> The University of Edinburgh is a charitable body, registered in
>>> Scotland, with registration number SC005336.
>>>
>>>
>>
>
>


-- 
-----------  ~oo~  --------------
Paolo Missier - Paolo.Missier@newcastle.ac.uk, pmissier@acm.org
School of Computing Science, Newcastle University,  UK
http://www.cs.ncl.ac.uk/people/Paolo.Missier
Received on Friday, 28 October 2011 14:58:11 UTC