Re: about EntityInRole and assumedBy from Daniel Garijo on 2011-10-28 (public-prov-wg@w3.org from October 2011)

From: Daniel Garijo <dgarijo@delicias.dia.fi.upm.es>
Date: Fri, 28 Oct 2011 17:26:33 +0200
To: Paul Groth <p.t.groth@vu.nl>
Cc: Simon Miles <simon.miles@kcl.ac.uk>, Provenance Working Group WG <public-prov-wg@w3.org>
Message-ID: <CAExK0DetxaFeVd+D587y1bkQmH=SNoULTiPZ5Ho5cdJmk2bp-g@mail.gmail.com>
There is also a problem assigning time to an entity (already pointed out by
Luc and Paolo).
If an entity is used by 2 different processes at 2 different times, and both
"usedAt" relationships
have as subject the entity, then we don't know which time corresponds to
which process execution.

However, I must admit that I forgot to add something to the different
approaches. For instance,
the first one (OPMO) is fully compatible with the third one (OPMV). As Luc
also pointed out in the
mailing list, we could use OPMV, and if we want to add more information to
the edges, we can add
the OPMO n-ary relationships to complement them.

Something similar happens to the second approach: we can link directly the
process to the entity with
the "used" property, and if we need more information, then we use the
"EntityInRole" for adding the time,
location, role, etc. What imo can be confusing here is that we would have 2
entities for refering to the same
"usage" relationship.

Satya pointed out that with the OPMO approach we would lose some semantics,
but I've been
thinking about it and I'm not that sure. According to what we discussed, he
claims that EntityInRole inherits the
properties of the role for that entity, but currently entityInRole is not a
subclass of Role. It is a subclass of Entity.

Anyway, I'm starting to be very ontology specific :) We'll continue the
discussion on monday.
Thanks,
Daniel

2011/10/28 Paul Groth <p.t.groth@vu.nl>

> Hi Simon,
>
> I think your approach would modify the data model pretty radically for not
> much gain syntactically.
>
> In a publication activity, for example, there may be many publications or
> intermediate outputs that were generated. With your approach this forces me
> into breaking down that activity. Furthermore, it may be the case that I can
> document the publication activity but don't know how it's broken down
> because I'm just observing it.
>
> Thanks,
> Paul
>
> Simon Miles wrote:
>
>> Hi Daniel,
>>
>> That's a useful categorisation of approaches.
>>
>> While we are focusing on simplification of the specifications, I would
>> like to re-advocate approach 3, which provides:
>>  - simple assertions for basic knowledge (as Paul has been advocating),
>>  - has properties rather than classes representing relationships, and
>>  - maps nicely to the conceptual model.
>>
>> I don't find the arguments against option 3 to be strong ones. Most
>> information that you might attach to a used/generated edge could be
>> attached to a process or an entity if what has occurred is described
>> at the right granularity. I would argue that if you find you want to
>> annotate a used/generated event with, for example, location, this just
>> means you need a finer-grained account where it is clear that the
>> location was of a particular entity or a particular process. The only
>> annotation I can see that makes sense for an event is a timestamp,
>> with the time of generation being the critical one. However, I suggest
>> approach 3 still allows this.
>>
>> My proposal:
>>
>> prov:used is an edge, domain is prov:ProcessExecution, range is
>> prov:Entity
>> prov:wasGeneratedBy is an edge, domain is prov:Entity, range is
>> prov:ProcessExecution
>>
>> Asserting a role:
>> :pe1 :usedConfigurationFile :entity1.
>> :usedConfigurationFile rdfs:subPropertyOf prov:used.
>>
>> Note that, if we want to avoid the need for (explicit) inference, both
>> statements need to be present in the provenance data and the pattern
>> above needs to be interpreted as expressing a used role by queriers.
>>
>> prov:wasGeneratedAtEndOf is a sub-property of prov:wasGeneratedBy
>> prov:usedAtStartOf is a sub-property of prov:used
>>
>> Asserting time of generation:
>>  :entity2 prov:wasGeneratedAtEndOf :pe2.
>>  :pe2 prov:endedAt :timestamp2.
>>
>> If the process you have asserted so far generates the entity partway
>> through, then provide a finer-grained account of that process where
>> the generation occurs at the end of a (finer grained) process. If you
>> are able to say when an entity was generated during a process then you
>> have enough information to decompose the process, i.e. you at least
>> know there was part of the process before the entity's generation and
>> part after.
>>
>> Thanks,
>> Simon
>>
>> On 28 October 2011 12:22, Daniel Garijo<dgarijo@delicias.dia.**fi.upm.es<dgarijo@delicias.dia.fi.upm.es>>
>>  wrote:
>>
>>> Hi James,
>>>
>>> When we started this small task force, we looked for alternatives to
>>> solve
>>> this issue. We had 3 possible approcahes:
>>>
>>> The OPMO approach modelling edges as n-ary relationships.
>>> Satya's approach with EntityInRole.
>>> The OPMV approach especializing roles as subproperties of the "use"
>>> relationship.
>>>
>>> Approach 3) was dropped because despite it is very simple and direct, it
>>> is
>>> not possible to add time and location to the edges
>>> (all the properties are binary).
>>> Approach 1) covered the functionality in my opinion (although Satya may
>>> have
>>> some objections against it), but we decided to go
>>> for approach 2) which now has brought some problems.
>>>
>>> Unfortunately I'm not aware of other ways to model n-ary relationships.
>>> If
>>> someone comes up with a new approach on monday,
>>> I'll be happy to discuss it.
>>>
>>> Best,
>>> Daniel
>>>
>>> 2011/10/28 James Cheney<jcheney@inf.ed.ac.uk>
>>>
>>>> Just to clarify, I believe the alternative I'd suggested (which Khalid
>>>> gives in proper Turtle/RDF notation below) is similar to that in OPM-O,
>>>> which had Used as a class rather than a property.  I'm not claiming to
>>>> have
>>>> come up with this on my own, I'm sure I had seen the OPM-O treatment and
>>>> inadvertently reinvented it.
>>>>
>>>> My main question is, is that approach more palatable than the
>>>> EntityInRole
>>>> approach?  If not, what alternative would be a better fit for the
>>>> current
>>>> PROV-DM?
>>>>
>>>> --James
>>>>
>>>> On Oct 26, 2011, at 7:42 PM, Khalid Belhajjame wrote:
>>>>
>>>>  Hi Luc, all,
>>>>>
>>>>> We are aware of the problems that EntityInRole is introducing.
>>>>> Following
>>>>> your comments in last emails, we are contemplating a new alternative,
>>>>> that
>>>>> was suggested by James, which seems to be inline with the PROV-DM. Not
>>>>> all
>>>>> members of the ontology group have expressed their opinion yet, though.
>>>>>
>>>>> The idea is to have two owl classes Usage and Generation. These two
>>>>> classes are connected to Entity and Process Execution (using object
>>>>> properties), and qualifiers, such as role, can be defined as object
>>>>> properties of Usage and Generation. For example, Usage can be defined
>>>>> as:
>>>>>
>>>>> prov:Usage a owl:Class.
>>>>>
>>>>> prov:hasEntity
>>>>>    a rdf:Property ;
>>>>>    rdfs:domain prov:Usage ;
>>>>>    rdfs:range prov:Entity .
>>>>>
>>>>> prov:hasProcessExecution
>>>>>    a rdf:Property ;
>>>>>    rdfs:domain prov:Usage ;
>>>>>    rdfs:range prov:ProcessExecution .
>>>>>
>>>>> prov:hasRole
>>>>>    a rdf:Property ;
>>>>>    rdfs:domain prov:Usage ;
>>>>>    rdfs:range prov:Role .
>>>>> ...
>>>>>
>>>>> Additionally, to be closer to the model, we can define shortcuts
>>>>> properties: used and wasGeneratedBy as object properties that connect
>>>>> process execution and entities. For example,
>>>>>
>>>>> prov:used
>>>>>    a rdf:Property ;
>>>>>    rdfs:domain prov:ProcessExecution ;
>>>>>    rdfs:range prov:Entity .
>>>>>
>>>>> There is at least one issue that we are aware of using this design, but
>>>>> IMO, it is better.
>>>>>
>>>>> khalid
>>>>>
>>>>> On 26/10/2011 16:22, Luc Moreau wrote:
>>>>>
>>>>>> Dear all,
>>>>>>
>>>>>> I would like to reiterate again my concern about the mismatch between
>>>>>> prov-dm and prov-o. I think the two notions the documents diverge on
>>>>>> are
>>>>>> entityInRole (a subclass of Entity) and assumedBy (which in essence is
>>>>>> a kind of specialization of wasComplementOf).
>>>>>>
>>>>>> They are introduced in the ontology to be able to express time and
>>>>>> qualifiers
>>>>>> (including role) for use and generation.
>>>>>>
>>>>>> It is suggested by JamesC and Graham that we write examples to
>>>>>> understand the
>>>>>> differences, it is a very good point that I support.
>>>>>>
>>>>>> I would like to explain the problem I have with the EntityInRole
>>>>>> solution.
>>>>>>
>>>>>> Let's take the case of a used relation (ideas apply similarly to
>>>>>> generation).
>>>>>> Let's imagine there is no qualifier/time information.
>>>>>> We want to be simple, and express a property, as prov-o does:
>>>>>>
>>>>>> Encoding1:
>>>>>>  pe prov:used e1
>>>>>>
>>>>>> If we suddently have time information or role, according to prov-o, we
>>>>>> would have to write:
>>>>>>
>>>>>> Encoding2:
>>>>>>
>>>>>> pe prov:used e1X
>>>>>> e1X  prov:assumedBy e1
>>>>>> e1X  prov:assumedAt t1
>>>>>> e1X  prov:assumedRole r
>>>>>>
>>>>>> My problems are the following:
>>>>>>
>>>>>> - Encoding2 is not an extension of encoding1: it  does not just add
>>>>>> new
>>>>>> edges,
>>>>>>  it removes some.
>>>>>>  But according to the data model, we just have added extra
>>>>>> information.
>>>>>>
>>>>>> - why should I change my "modelling of what happens in the world",
>>>>>>  encoding2 has got two entities when encoding1 has got only one.
>>>>>>
>>>>>> - I believe it would be reasonable to write
>>>>>>    e1X wasComplementOf e1  (since it looks like e1X has attributes
>>>>>> that
>>>>>> e1 doesn't have,
>>>>>>    and have a common time interval).
>>>>>>
>>>>>>  What's the difference between e1X WasComplementOf e1 and e1X
>>>>>> assumedBy
>>>>>> e1?
>>>>>>   wasComplementOf is hard enough, why do we have to have something so
>>>>>> similar to it,
>>>>>>   but restricted to entityInRole?
>>>>>>
>>>>>> - Imagine the scenario is more complex and e1 is used by a second
>>>>>> process in another
>>>>>>  role, at time t2>t1 so we may have another entity e2X.
>>>>>>  It would also be reasonable to write e2X wasDerivedFrom e1X because
>>>>>> e2X follows e1X.
>>>>>>  But this wouldn't be possible in prov-dm, unless we explictly
>>>>>> introduce e1X and e2X.
>>>>>>
>>>>>>
>>>>>> Cheers,
>>>>>> Luc
>>>>>>
>>>>>>
>>>>>>
>>>>>  --
>>>> The University of Edinburgh is a charitable body, registered in
>>>> Scotland, with registration number SC005336.
>>>>
>>>>
>>>>
>>
>>
>>
> --
> Dr. Paul Groth (p.t.groth@vu.nl)
> http://www.few.vu.nl/~pgroth
> Assistant Professor
> Knowledge Representation & Reasoning Group
> Artificial Intelligence Section
> Department of Computer Science
> VU University Amsterdam
>
>
>
Received on Friday, 28 October 2011 15:27:02 UTC