Re: playing with pil ontology from Satya Sahoo on 2011-08-15 (public-prov-wg@w3.org from August 2011)

From: Satya Sahoo <satya.sahoo@case.edu>
Date: Mon, 15 Aug 2011 11:01:38 -0400
To: "Deus, Helena" <helena.deus@deri.org>
Cc: Khalid Belhajjame <Khalid.Belhajjame@cs.man.ac.uk>, public-prov-wg@w3.org
Message-ID: <CAOMwk6ySqe-O=8YTJY0tLCDFohZN4aoD8brMuo3fC8oP=mmbZA@mail.gmail.com>
Hi Lena,
Thanks again for trying to use the ontology for the microarray use case!

My comments are inline:

>I am not questioning whether agent should be mapped to agents defined
elsewhere, which seems to >be obvious– only wondering whether agent “label”
and “description” are things we want to standardize >in our model or not. We
can “suggest” rdfs:label and rdfs:comment without enforcing it as such –
>having those included in the model will likely result in much less
heterogeneity when it comes to >reporting provenance (particularly since we
are defining it necessarily “open” and highly granular to fit >any
particular domain.

I am not sure I understand your point. The rdfs:label and rdfs:comment are
two of the nine annotation properties that are part of the OWL2 syntax. So,
the provenance ontology encoded in OWL includes them by default.



> What was its intended purpose/role in the description of provenance?

Provenance container, account, and collection are related concepts for
modeling a collection of provenance assertions. E.g. provenance of a
Affymetrix gene chip will be a collection of provenance assertions (date of
manufacture, location of manufacturer, production series etc.) that can be
stored in a single file and the file will be a provenance container.



>Example: a list of height measurement is an “untransformed” entity (a
dataset); the average of that list >is the “transformed” entity (another
dataset, although a very simple one).

>I am dealing with much more complex workflows, (e.g. files containing the
outcome of a microarray >experiment as the untransformed dataset and a list
of differentially expressed genes as the >transformed dataset), so please
take the example above is just illustrative.

I am not sure I see the granularity/expressivity issue in the above example
(from your first mail). Both the "untransformed" and "transformed" entities
map to input and output data of a process execution - we can create subclass
of Entity for this purpose.



>An investigator (agent) performs an experiment That experiment has several
input parameters, some >of which are entities (e.g. samples), other are not
(e.g. temperature) Resulting from the experiment are >several output
parameters (entities)

I am confused by the above scenario. Why is temperature not an entity? Both
the input (sample) and (temperature) are special types (sub class) of
entities - (a) InputData and (b) InputParameter etc.


> So if I understand what you are saying correctly, “temperature” would be
an entity of type “input”, >which in turn would be subclass of “role”. An
instance of “input” could then have a certain value (e.g. >15C) in one of
its properties?

>In that case, does it make sense to include “input” and “output” classes in
the model as subclasses of >“role”? Or is this something that me and Stephan
exemplify in the primer document under “usage of >agent” (or something of
the sort)?

I agree with Khalid's example where Role allows us to model more complex
scenarios. For example, X is an instance of class HumanBeing (perhaps as
subclass of entity) and X has multiple roles - researcher, parent, soccer
player etc. To model these "functions" we will use the Role class. I believe
in the microarray scenario (in your first mail) Roles are not needed.


> In that case, does it make sense to include “input” and “output” classes
in the model as >subclasses of “role”? Or is this something that me and
Stephan exemplify in the primer >document under “usage of agent” (or
something of the sort)?

Sorry I did not understand this. Role can be used by any entity, why only
"usage of agent"?

Thanks.

Best,
Satya

On Mon, Aug 15, 2011 at 7:01 AM, Deus, Helena <helena.deus@deri.org> wrote:

> Hi Khalid,****
>
> Please see comments inline****
>
> ** **
>
> *From:* Khalid Belhajjame [mailto:Khalid.Belhajjame@cs.man.ac.uk]
> *Sent:* 12 August 2011 10:22
> *To:* Deus, Helena
> *Cc:* public-prov-wg@w3.org
> *Subject:* Re: playing with pil ontology****
>
> ** **
>
>
> Hi Helena,
>
> Thanks for this, I think that this is a good exercise and some of the point
> you mentioned relate to the conceptual model, not only the formal model.
>
> On 11/08/2011 18:52, Deus, Helena wrote: ****
>
> Hi all,
>
> Reiterating a bit on what was addressed today  in the telco, I downloaded
> the ontology from mercurial and tried to use it with my use case. ****
>
> I am using the use cases published in [1] and demoed with SPARQL at
> http://biordfmicroarray.googlecode.com/hg/sparql_endpoint.html****
>
>  ****
>
> Here is my input so far: ****
>
>  ****
>
> Agent could have dataProperty “label” and “description”; it would help the
> implementer describe what type of agent does he/she intend to describe. Is
> the ontology here being confused with the query model?****
>
> I think that there was previously a long thread discussion on agent and
> agent types, and whether the model should be prescriptive in this respect.
> One of the solutions that I think many people were happy with is to leave
> users choose their favorite model(ontology) for agent, which means that the
> agent class defined in the ontology acts as a place holder that can be
> specialized to include description, types, and whatever the application
> needs.****
>
> ** **
>
> I am not questioning whether agent should be mapped to agents defined
> elsewhere, which seems to be obvious– only wondering whether agent “label”
> and “description” are things we want to standardize in our model or not. We
> can “suggest” rdfs:label and rdfs:comment without enforcing it as such –
> having those included in the model will likely result in much less
> heterogeneity when it comes to reporting provenance (particularly since we
> are defining it necessarily “open” and highly granular to fit any particular
> domain. ****
>
>
>
> ****
>
> ProvenanceContainer is not useful, or its description is not clear; what
> should be an instance of provenanceContainer?****
>
>
> At this stage, the description of this concept is not yet stable in the
> conceptual model as far as I know.****
>
> ** **
>
> What was its intended purpose/role in the description of provenance?****
>
>
>
> ****
>
> I want to create an instance of a “untransformed” entity (in my case, a
> dataset) and a “transformed” entity. Is the model going to give me that
> granularity/expressivity or do we expect each implementer to come up with
> their own way of defining these?****
>
> Could you please clarify what you mean by transformed and untransformed
> entity?
>
> ****
>
> Example: a list of height measurement is an “untransformed” entity (a
> dataset); the average of that list is the “transformed” entity (another
> dataset, although a very simple one). ****
>
> ** **
>
> I am dealing with much more complex workflows, (e.g. files containing the
> outcome of a microarray experiment as the untransformed dataset and a list
> of differentially expressed genes as the transformed dataset), so please
> take the example above is just illustrative. ****
>
>
>
> ****
>
> ProcessExecution needs more expressivity, I think. Not sure how to solve
> this in a domain independent way, but here’s my problem:****
>
> An investigator (agent) performs an experiment****
>
> That experiment has several input parameters, some of which are entities
> (e.g. samples), other are not (e.g. temperature). ****
>
> Resulting from the experiment are several output parameters (entities)****
>
>
> I think that the current model caters for the above need. If you are
> specifically trying to differentiate between different kinds of inputs
> (samples as opposed to temperature), then the notion of role can be helpful
> in this resepect.
>
> ****
>
> ** **
>
> So if I understand what you are saying correctly, “temperature” would be an
> entity of type “input”, which in turn would be subclass of “role”. An
> instance of “input” could then have a certain value (e.g. 15C) in one of its
> properties? ****
>
> In that case, does it make sense to include “input” and “output” classes in
> the model as subclasses of “role”? Or is this something that me and Stephan
> exemplify in the primer document under “usage of agent” (or something of the
> sort)?****
>
> ** **
>
>
>
> Thanks, khalid
>
> ****
>
>  ****
>
> Have not completed my “experiment” yet, but will provide more feedback soon
> J****
>
>  ****
>
> Best Regards,****
>
> Helena F. Deus****
>
> Post-doctoral Researcher
> Digital Enterprise Research Institute****
>
> National University of Ireland, Galway****
>
> http://lenadeus.info ****
>
> ** **
>
Received on Monday, 15 August 2011 15:02:19 UTC