Re: playing with pil ontology from Graham Klyne on 2011-08-15 (public-prov-wg@w3.org from August 2011)

From: Graham Klyne <GK@ninebynine.org>
Date: Mon, 15 Aug 2011 22:17:51 +0100
To: Daniel Garijo <dgarijo@delicias.dia.fi.upm.es>
CC: "Myers, Jim" <MYERSJ4@rpi.edu>, Satya Sahoo <satya.sahoo@case.edu>, "Deus, Helena" <helena.deus@deri.org>, Khalid Belhajjame <Khalid.Belhajjame@cs.man.ac.uk>, "public-prov-wg@w3.org" <public-prov-wg@w3.org>
Message-ID: <4E498CFF.6070700@ninebynine.org>
Daniel,

It sounds to me as if you're trying to subdivide web resources, and that seems 
to me like a potential lot of complexity for questionable gain.

(If you're thinking of something like named graphs in an RDF document, then 
fine:  here each of the graphs has its own URI, so for descriptive purposes can 
be treated as a separate web resource.  I don't think this is something the 
model needs to explicitly recognize, as it amounts to an implementation detail.)

#g
--

Daniel Garijo wrote:
> Hi Graham,
> I like Provenance Container. What if your provenance statements were 
> created by different persons,
> processes or at different times, but they are within the same Provenance 
> Document
> (since they are provenance assertions about the same entity)? I may want 
> to describe the different
> provenance containers, or even the provenance container descriptions 
> with another one.
> 
> Thanks,
> Daniel
> 
> 2011/8/15 Graham Klyne <GK@ninebynine.org <mailto:GK@ninebynine.org>>
> 
>     Jim,
> 
>     FWIW, in PAQ we talk about "provenance information" as just another
>     resource that includes provenance assertions.  To my mind, it's
>     primary representation would be as an RDF document.
> 
>     The terminology here is subject to review and harmonization with the
>     model, but I'm not convinced that we need a new concept in the model
>     for this, and I'm not keen on a name involving "container", as in my
>     mind that sets up expectations of a distinct layer of encapsulation.
>      We don't talk about "containers" for HTML or XML elements, we just
>     talk about HTML and XML documents.  Same for provenance, IMO.
> 
>     I suppose that suggests "Provenance Document", or similar.
> 
>     #g
>     --
> 
>     Myers, Jim wrote:
> 
> 
> 
>         A couple quick comments: I don’t think we’ve distinguished
>         provenance container and account at this point – they are an
>         entity which contains provenance statements and are used to
>         enable you to talk about how the provenance was created (what
>         processes and inputs caused those statements to be), but
>         collection has been discussed as a general aggregate
>         entity/container – a bag of marbles is an entity and saying a
>         process execution used it is shorthand for talking about the
>         individual marbles. A file is a collection of bytes and a
>         process execution may only use some of the bytes, etc.
> 
>          
>         Re: roles – I would argue that you should use something quite
>         specific for the role of your temperature parameter, e.g.
>         “processingtempraturesetpoint’ rather than a generic “input” or
>         “inputParameter” role (parameter might still be a supertype of
>         processingtemperaturesetpoint) . This would be necessary if, for
>         example, your process execution had a reaction temperature and a
>         storage temperature as inputs – now you have two numbers/two
>         temperatures and you have to use each in the correct role for
>         the provenance to be correct. In many cases, you could
>         potentially describe the type of the entity itself well enough
>         to make the provenance clear, but putting the information into
>         the entity typing rather than into the role it has relative to
>         the process execution causes trouble if you use the entity in
>         multiple processes (if I make an entity that is of type “
>         processingtemperaturesetpoint” and I have a second process that
>         displays a “printablenumber” that uses it as input, the same
>         entity can’t also be of type “printable number” – better to make
>         the entity have type number and play a ‘
>         processingtemperaturesetpoint” role in one process and the
>         “printablenumber” role in the other.)
> 
>          
>         Jim
> 
>          
>         *From:* public-prov-wg-request@w3.org
>         <mailto:public-prov-wg-request@w3.org> [mailto:public-prov-wg-
>         request@w3.org <mailto:public-prov-wg-request@w3.org>] *On
>         Behalf Of *Satya Sahoo
>         *Sent:* Monday, August 15, 2011 11:02 AM
>         *To:* Deus, Helena
>         *Cc:* Khalid Belhajjame; public-prov-wg@w3.org
>         <mailto:public-prov-wg@w3.org>
>         *Subject:* Re: playing with pil ontology
> 
>          
>         Hi Lena,
> 
>         Thanks again for trying to use the ontology for the microarray
>         use case!
>          
>         My comments are inline:
> 
>          
>          >I am not questioning whether agent should be mapped to agents
>         defined elsewhere, which seems to >be obvious– only wondering
>         whether agent “label” and “description” are things we want to
>         standardize >in our model or not. We can “suggest” rdfs:label
>         and rdfs:comment without enforcing it as such – >having those
>         included in the model will likely result in much less
>         heterogeneity when it comes to >reporting provenance
>         (particularly since we are defining it necessarily “open” and
>         highly granular to fit >any particular domain.
> 
>          
>         I am not sure I understand your point. The rdfs:label and
>         rdfs:comment are two of the nine annotation properties that are
>         part of the OWL2 syntax. So, the provenance ontology encoded in
>         OWL includes them by default.
> 
>          
>          
> 
>          > What was its intended purpose/role in the description of
>         provenance?
> 
>          
>         Provenance container, account, and collection are related
>         concepts for modeling a collection of provenance assertions.
>         E.g. provenance of a Affymetrix gene chip will be a collection
>         of provenance assertions (date of manufacture, location of
>         manufacturer, production series etc.) that can be stored in a
>         single file and the file will be a provenance container.  
>          
>          
>          
> 
>             Example: a list of height measurement is an “untransformed”
>             entity (a
> 
>         dataset); the average of that list >is the “transformed” entity
>         (another dataset, although a very simple one).
> 
>             I am dealing with much more complex workflows, (e.g. files
>             containing
> 
>         the outcome of a microarray >experiment as the untransformed
>         dataset and a list of differentially expressed genes as the
>          >transformed dataset), so please take the example above is just
>         illustrative.
> 
>          
>         I am not sure I see the granularity/expressivity issue in the
>         above example (from your first mail). Both the "untransformed"
>         and "transformed" entities map to input and output data of a
>         process execution - we can create subclass of Entity for this
>         purpose.
> 
>          
>          
>          
> 
>             An investigator (agent) performs an experiment That
>             experiment has
> 
>         several input parameters, some >of which are entities (e.g.
>         samples), other are not (e.g. temperature) Resulting from the
>         experiment are  >several output parameters (entities)
> 
>          
>         I am confused by the above scenario. Why is temperature not an
>         entity? Both the input (sample) and (temperature) are special
>         types (sub class) of entities - (a) InputData and (b)
>         InputParameter etc.
> 
>          
>          
> 
>             So if I understand what you are saying correctly,
>             “temperature” would
> 
>         be an entity of type “input”, >which in turn would be subclass
>         of “role”. An instance of “input” could then have a certain
>         value (e.g.  >15C) in one of its properties?
> 
>             In that case, does it make sense to include “input” and
>             “output” classes
> 
>         in the model as subclasses of >“role”? Or is this something that
>         me and Stephan exemplify in the primer document under “usage of
>          >agent” (or something of the sort)?
> 
>          
>         I agree with Khalid's example where Role allows us to model more
>         complex scenarios. For example, X is an instance of class
>         HumanBeing (perhaps as subclass of entity) and X has multiple
>         roles - researcher, parent, soccer player etc. To model these
>         "functions" we will use the Role class. I believe in the
>         microarray scenario (in your first mail) Roles are not needed.
> 
>          
>          
> 
>             In that case, does it make sense to include “input” and
>             “output”
> 
>         classes in the model as >subclasses of “role”? Or is this
>         something that me and Stephan exemplify in the primer >document
>         under “usage of agent” (or something of the sort)?
> 
>          
>         Sorry I did not understand this. Role can be used by any entity,
>         why only "usage of agent"?
> 
>          
>         Thanks.
> 
>          
>         Best,
> 
>         Satya
> 
>          
>         On Mon, Aug 15, 2011 at 7:01 AM, Deus, Helena
>         <helena.deus@deri.org <mailto:helena.deus@deri.org>
>         <mailto:helena.deus@deri.org <mailto:helena.deus@deri.org>>> wrote:
> 
>         Hi Khalid,
> 
>         Please see comments inline
> 
>          
>         *From:* Khalid Belhajjame [mailto:Khalid.Belhajjame@cs.
>         man.ac.uk <mailto:Khalid.Belhajjame@cs.man.ac.uk>
>         <mailto:Khalid.Belhajjame@cs. man.ac.uk
>         <mailto:Khalid.Belhajjame@cs.man.ac.uk>>]
> 
>         *Sent:* 12 August 2011 10:22
>         *To:* Deus, Helena
>         *Cc:* public-prov-wg@w3.org <mailto:public-prov-wg@w3.org>
>         <mailto:public-prov-wg@w3.org <mailto:public-prov-wg@w3.org>>
> 
>         *Subject:* Re: playing with pil ontology
> 
>          
> 
>         Hi Helena,
> 
>         Thanks for this, I think that this is a good exercise and some
>         of the point you mentioned relate to the conceptual model, not
>         only the formal model.
> 
>         On 11/08/2011 18:52, Deus, Helena wrote:
> 
>         Hi all,
> 
>         Reiterating a bit on what was addressed today  in the telco, I
>         downloaded the ontology from mercurial and tried to use it with
>         my use case.
> 
>         I am using the use cases published in [1] and demoed with SPARQL
>         at http://biordfmicroarray. googlecode.com/hg/sparql_
>         endpoint.html
>         <http://biordfmicroarray.googlecode.com/hg/sparql_endpoint.html>
> 
>          
>         Here is my input so far:
> 
>          
>         Agent could have dataProperty “label” and “description”; it
>         would help the implementer describe what type of agent does
>         he/she intend to describe. Is the ontology here being confused
>         with the query model?
> 
>         I think that there was previously a long thread discussion on
>         agent and agent types, and whether the model should be
>         prescriptive in this respect. One of the solutions that I think
>         many people were happy with is to leave users choose their
>         favorite model(ontology) for agent, which means that the agent
>         class defined in the ontology acts as a place holder that can be
>         specialized to include description, types, and whatever the
>         application needs.
> 
>          
>         I am not questioning whether agent should be mapped to agents
>         defined elsewhere, which seems to be obvious– only wondering
>         whether agent “label” and “description” are things we want to
>         standardize in our model or not. We can “suggest” rdfs:label and
>         rdfs:comment without enforcing it as such – having those
>         included in the model will likely result in much less
>         heterogeneity when it comes to reporting provenance
>         (particularly since we are defining it necessarily “open” and
>         highly granular to fit any particular domain.
> 
>          
>         ProvenanceContainer is not useful, or its description is not
>         clear; what should be an instance of provenanceContainer?
> 
> 
>         At this stage, the description of this concept is not yet stable
>         in the conceptual model as far as I know.
> 
>          
>         What was its intended purpose/role in the description of provenance?
> 
>          
>         I want to create an instance of a “untransformed” entity (in my
>         case, a dataset) and a “transformed” entity. Is the model going
>         to give me that granularity/expressivity or do we expect each
>         implementer to come up with their own way of defining these?
> 
>         Could you please clarify what you mean by transformed and
>         untransformed entity?
> 
>         Example: a list of height measurement is an “untransformed”
>         entity (a dataset); the average of that list is the
>         “transformed” entity (another dataset, although a very simple one).
> 
>          
>         I am dealing with much more complex workflows, (e.g. files
>         containing the outcome of a microarray experiment as the
>         untransformed dataset and a list of differentially expressed
>         genes as the transformed dataset), so please take the example
>         above is just illustrative.
> 
>          
>         ProcessExecution needs more expressivity, I think. Not sure how
>         to solve this in a domain independent way, but here’s my problem:
> 
>         An investigator (agent) performs an experiment
> 
>         That experiment has several input parameters, some of which are
>         entities (e.g. samples), other are not (e.g. temperature).
> 
>         Resulting from the experiment are several output parameters
>         (entities)
> 
> 
>         I think that the current model caters for the above need. If you
>         are specifically trying to differentiate between different kinds
>         of inputs (samples as opposed to temperature), then the notion
>         of role can be helpful in this resepect.
> 
>          
>         So if I understand what you are saying correctly, “temperature”
>         would be an entity of type “input”, which in turn would be
>         subclass of “role”. An instance of “input” could then have a
>         certain value (e.g. 15C) in one of its properties?
> 
>         In that case, does it make sense to include “input” and “output”
>         classes in the model as subclasses of “role”? Or is this
>         something that me and Stephan exemplify in the primer document
>         under “usage of agent” (or something of the sort)?
> 
>          
> 
> 
>         Thanks, khalid
> 
>          
>         Have not completed my “experiment” yet, but will provide more
>         feedback soon J
> 
>          
>         Best Regards,
> 
>         Helena F. Deus
> 
>         Post-doctoral Researcher
>         Digital Enterprise Research Institute
> 
>         National University of Ireland, Galway
> 
>         http://lenadeus.info
> 
>          
>          
> 
> 
> 
>
Received on Monday, 15 August 2011 21:42:00 UTC