RE: playing with pil ontology from Deus, Helena on 2011-08-16 (public-prov-wg@w3.org from August 2011)

From: Deus, Helena <helena.deus@deri.org>
Date: Tue, 16 Aug 2011 12:19:16 +0100
To: "Satya Sahoo" <satya.sahoo@case.edu>
Cc: "Khalid Belhajjame" <Khalid.Belhajjame@cs.man.ac.uk>, <public-prov-wg@w3.org>
Message-ID: <316ADBDBFE4F4D4AA4FEEF7496ECAEF9065C3325@EVS1.ac.nuigalway.ie>
Hi Satya,

 

From: Satya Sahoo [mailto:satya.sahoo@case.edu] 
Sent: 15 August 2011 16:02
To: Deus, Helena
Cc: Khalid Belhajjame; public-prov-wg@w3.org
Subject: Re: playing with pil ontology

 

Hi Lena,

Thanks again for trying to use the ontology for the microarray use case!


 

My comments are inline:

 

>I am not questioning whether agent should be mapped to agents defined
elsewhere, which seems to >be obvious- only wondering whether agent
"label" and "description" are things we want to standardize >in our
model or not. We can "suggest" rdfs:label and rdfs:comment without
enforcing it as such - >having those included in the model will likely
result in much less heterogeneity when it comes to >reporting provenance
(particularly since we are defining it necessarily "open" and highly
granular to fit >any particular domain.

 

I am not sure I understand your point. The rdfs:label and rdfs:comment
are two of the nine annotation properties that are part of the OWL2
syntax. So, the provenance ontology encoded in OWL includes them by
default.

 

I did not know this, thank you J

So, just to be clear - whenever I us the provenance ontology encoded in
OWL2, I am encouraged to use "rdfs:comment" and "rdfs:label" (and the
other nine properties) even though I can still chose to use some other
properties (e.g. dct:title). If so, I am satisfied with the answer ;-) 

 


> What was its intended purpose/role in the description of provenance?

 

Provenance container, account, and collection are related concepts for
modeling a collection of provenance assertions. E.g. provenance of a
Affymetrix gene chip will be a collection of provenance assertions (date
of manufacture, location of manufacturer, production series etc.) that
can be stored in a single file and the file will be a provenance
container.  

 

Cool idea. So the provenance container itself can contain documents that
are uses of the provenance ontology itself... Wicked ;-) 

Is this a standard method for making an ontology recursive?

 

 

>Example: a list of height measurement is an "untransformed" entity (a
dataset); the average of that list >is the "transformed" entity (another
dataset, although a very simple one).

>I am dealing with much more complex workflows, (e.g. files containing
the outcome of a microarray >experiment as the untransformed dataset and
a list of differentially expressed genes as the >transformed dataset),
so please take the example above is just illustrative.

 

I am not sure I see the granularity/expressivity issue in the above
example (from your first mail). Both the "untransformed" and
"transformed" entities map to input and output data of a process
execution - we can create subclass of Entity for this purpose.

 

 

 

>An investigator (agent) performs an experiment That experiment has
several input parameters, some >of which are entities (e.g. samples),
other are not (e.g. temperature) Resulting from the experiment are
>several output parameters (entities)

 

I am confused by the above scenario. Why is temperature not an entity?
Both the input (sample) and (temperature) are special types (sub class)
of entities - (a) InputData and (b) InputParameter etc.


I am reluctant to making temperature an entity because it does not have
a discrete value (would temperature 14.5 C be a different entity from
14.55C?). I did not see InputData and InputParameter as classes to be
used in the ontology... I may have been using the wrong
ontology/version...:S

 

> So if I understand what you are saying correctly, "temperature" would
be an entity of type "input", >which in turn would be subclass of
"role". An instance of "input" could then have a certain value (e.g.
>15C) in one of its properties?

>In that case, does it make sense to include "input" and "output"
classes in the model as subclasses of >"role"? Or is this something that
me and Stephan exemplify in the primer document under "usage of >agent"
(or something of the sort)?

 

I agree with Khalid's example where Role allows us to model more complex
scenarios. For example, X is an instance of class HumanBeing (perhaps as
subclass of entity) and X has multiple roles - researcher, parent,
soccer player etc. To model these "functions" we will use the Role
class. I believe in the microarray scenario (in your first mail) Roles
are not needed.

 

Would a normalization algorithm not be a "role" for an "agent" of type
algorithm?

 

> In that case, does it make sense to include "input" and "output"
classes in the model as >subclasses of "role"? Or is this something that
me and Stephan exemplify in the primer >document under "usage of agent"
(or something of the sort)?

 

Sorry I did not understand this. Role can be used by any entity, why
only "usage of agent"?


If someone wanting to use the provenance ontology asks the same question
as me: how to I specify the "input" and "output" of an agent of
transformation, what answer could I give them that will ensure
interoperability?

 

 

Thanks.

 

Best,

Satya

 

On Mon, Aug 15, 2011 at 7:01 AM, Deus, Helena <helena.deus@deri.org>
wrote:

Hi Khalid,

Please see comments inline

 

From: Khalid Belhajjame [mailto:Khalid.Belhajjame@cs.man.ac.uk] 
Sent: 12 August 2011 10:22
To: Deus, Helena
Cc: public-prov-wg@w3.org
Subject: Re: playing with pil ontology

 


Hi Helena, 

Thanks for this, I think that this is a good exercise and some of the
point you mentioned relate to the conceptual model, not only the formal
model.

On 11/08/2011 18:52, Deus, Helena wrote: 

Hi all, 

Reiterating a bit on what was addressed today  in the telco, I
downloaded the ontology from mercurial and tried to use it with my use
case. 

I am using the use cases published in [1] and demoed with SPARQL at
http://biordfmicroarray.googlecode.com/hg/sparql_endpoint.html

 

Here is my input so far: 

 

Agent could have dataProperty "label" and "description"; it would help
the implementer describe what type of agent does he/she intend to
describe. Is the ontology here being confused with the query model?

I think that there was previously a long thread discussion on agent and
agent types, and whether the model should be prescriptive in this
respect. One of the solutions that I think many people were happy with
is to leave users choose their favorite model(ontology) for agent, which
means that the agent class defined in the ontology acts as a place
holder that can be specialized to include description, types, and
whatever the application needs.

 

I am not questioning whether agent should be mapped to agents defined
elsewhere, which seems to be obvious- only wondering whether agent
"label" and "description" are things we want to standardize in our model
or not. We can "suggest" rdfs:label and rdfs:comment without enforcing
it as such - having those included in the model will likely result in
much less heterogeneity when it comes to reporting provenance
(particularly since we are defining it necessarily "open" and highly
granular to fit any particular domain. 

 

ProvenanceContainer is not useful, or its description is not clear; what
should be an instance of provenanceContainer?


At this stage, the description of this concept is not yet stable in the
conceptual model as far as I know.

 

What was its intended purpose/role in the description of provenance?

 

I want to create an instance of a "untransformed" entity (in my case, a
dataset) and a "transformed" entity. Is the model going to give me that
granularity/expressivity or do we expect each implementer to come up
with their own way of defining these?

Could you please clarify what you mean by transformed and untransformed
entity?

Example: a list of height measurement is an "untransformed" entity (a
dataset); the average of that list is the "transformed" entity (another
dataset, although a very simple one). 

 

I am dealing with much more complex workflows, (e.g. files containing
the outcome of a microarray experiment as the untransformed dataset and
a list of differentially expressed genes as the transformed dataset), so
please take the example above is just illustrative. 

 

ProcessExecution needs more expressivity, I think. Not sure how to solve
this in a domain independent way, but here's my problem:

An investigator (agent) performs an experiment

That experiment has several input parameters, some of which are entities
(e.g. samples), other are not (e.g. temperature). 

Resulting from the experiment are several output parameters (entities)


I think that the current model caters for the above need. If you are
specifically trying to differentiate between different kinds of inputs
(samples as opposed to temperature), then the notion of role can be
helpful in this resepect.

 

So if I understand what you are saying correctly, "temperature" would be
an entity of type "input", which in turn would be subclass of "role". An
instance of "input" could then have a certain value (e.g. 15C) in one of
its properties? 

In that case, does it make sense to include "input" and "output" classes
in the model as subclasses of "role"? Or is this something that me and
Stephan exemplify in the primer document under "usage of agent" (or
something of the sort)?

 



Thanks, khalid

 

Have not completed my "experiment" yet, but will provide more feedback
soon J

 

Best Regards,

Helena F. Deus

Post-doctoral Researcher
Digital Enterprise Research Institute

National University of Ireland, Galway

http://lenadeus.info
Received on Tuesday, 16 August 2011 11:19:45 UTC