- From: Graham Klyne <GK@ninebynine.org>
- Date: Mon, 15 Aug 2011 22:17:51 +0100
- To: Daniel Garijo <dgarijo@delicias.dia.fi.upm.es>
- CC: "Myers, Jim" <MYERSJ4@rpi.edu>, Satya Sahoo <satya.sahoo@case.edu>, "Deus, Helena" <helena.deus@deri.org>, Khalid Belhajjame <Khalid.Belhajjame@cs.man.ac.uk>, "public-prov-wg@w3.org" <public-prov-wg@w3.org>
Daniel, It sounds to me as if you're trying to subdivide web resources, and that seems to me like a potential lot of complexity for questionable gain. (If you're thinking of something like named graphs in an RDF document, then fine: here each of the graphs has its own URI, so for descriptive purposes can be treated as a separate web resource. I don't think this is something the model needs to explicitly recognize, as it amounts to an implementation detail.) #g -- Daniel Garijo wrote: > Hi Graham, > I like Provenance Container. What if your provenance statements were > created by different persons, > processes or at different times, but they are within the same Provenance > Document > (since they are provenance assertions about the same entity)? I may want > to describe the different > provenance containers, or even the provenance container descriptions > with another one. > > Thanks, > Daniel > > 2011/8/15 Graham Klyne <GK@ninebynine.org <mailto:GK@ninebynine.org>> > > Jim, > > FWIW, in PAQ we talk about "provenance information" as just another > resource that includes provenance assertions. To my mind, it's > primary representation would be as an RDF document. > > The terminology here is subject to review and harmonization with the > model, but I'm not convinced that we need a new concept in the model > for this, and I'm not keen on a name involving "container", as in my > mind that sets up expectations of a distinct layer of encapsulation. > We don't talk about "containers" for HTML or XML elements, we just > talk about HTML and XML documents. Same for provenance, IMO. > > I suppose that suggests "Provenance Document", or similar. > > #g > -- > > Myers, Jim wrote: > > > > A couple quick comments: I don’t think we’ve distinguished > provenance container and account at this point – they are an > entity which contains provenance statements and are used to > enable you to talk about how the provenance was created (what > processes and inputs caused those statements to be), but > collection has been discussed as a general aggregate > entity/container – a bag of marbles is an entity and saying a > process execution used it is shorthand for talking about the > individual marbles. A file is a collection of bytes and a > process execution may only use some of the bytes, etc. > > > Re: roles – I would argue that you should use something quite > specific for the role of your temperature parameter, e.g. > “processingtempraturesetpoint’ rather than a generic “input” or > “inputParameter” role (parameter might still be a supertype of > processingtemperaturesetpoint) . This would be necessary if, for > example, your process execution had a reaction temperature and a > storage temperature as inputs – now you have two numbers/two > temperatures and you have to use each in the correct role for > the provenance to be correct. In many cases, you could > potentially describe the type of the entity itself well enough > to make the provenance clear, but putting the information into > the entity typing rather than into the role it has relative to > the process execution causes trouble if you use the entity in > multiple processes (if I make an entity that is of type “ > processingtemperaturesetpoint” and I have a second process that > displays a “printablenumber” that uses it as input, the same > entity can’t also be of type “printable number” – better to make > the entity have type number and play a ‘ > processingtemperaturesetpoint” role in one process and the > “printablenumber” role in the other.) > > > Jim > > > *From:* public-prov-wg-request@w3.org > <mailto:public-prov-wg-request@w3.org> [mailto:public-prov-wg- > request@w3.org <mailto:public-prov-wg-request@w3.org>] *On > Behalf Of *Satya Sahoo > *Sent:* Monday, August 15, 2011 11:02 AM > *To:* Deus, Helena > *Cc:* Khalid Belhajjame; public-prov-wg@w3.org > <mailto:public-prov-wg@w3.org> > *Subject:* Re: playing with pil ontology > > > Hi Lena, > > Thanks again for trying to use the ontology for the microarray > use case! > > My comments are inline: > > > >I am not questioning whether agent should be mapped to agents > defined elsewhere, which seems to >be obvious– only wondering > whether agent “label” and “description” are things we want to > standardize >in our model or not. We can “suggest” rdfs:label > and rdfs:comment without enforcing it as such – >having those > included in the model will likely result in much less > heterogeneity when it comes to >reporting provenance > (particularly since we are defining it necessarily “open” and > highly granular to fit >any particular domain. > > > I am not sure I understand your point. The rdfs:label and > rdfs:comment are two of the nine annotation properties that are > part of the OWL2 syntax. So, the provenance ontology encoded in > OWL includes them by default. > > > > > > What was its intended purpose/role in the description of > provenance? > > > Provenance container, account, and collection are related > concepts for modeling a collection of provenance assertions. > E.g. provenance of a Affymetrix gene chip will be a collection > of provenance assertions (date of manufacture, location of > manufacturer, production series etc.) that can be stored in a > single file and the file will be a provenance container. > > > > > Example: a list of height measurement is an “untransformed” > entity (a > > dataset); the average of that list >is the “transformed” entity > (another dataset, although a very simple one). > > I am dealing with much more complex workflows, (e.g. files > containing > > the outcome of a microarray >experiment as the untransformed > dataset and a list of differentially expressed genes as the > >transformed dataset), so please take the example above is just > illustrative. > > > I am not sure I see the granularity/expressivity issue in the > above example (from your first mail). Both the "untransformed" > and "transformed" entities map to input and output data of a > process execution - we can create subclass of Entity for this > purpose. > > > > > > An investigator (agent) performs an experiment That > experiment has > > several input parameters, some >of which are entities (e.g. > samples), other are not (e.g. temperature) Resulting from the > experiment are >several output parameters (entities) > > > I am confused by the above scenario. Why is temperature not an > entity? Both the input (sample) and (temperature) are special > types (sub class) of entities - (a) InputData and (b) > InputParameter etc. > > > > > So if I understand what you are saying correctly, > “temperature” would > > be an entity of type “input”, >which in turn would be subclass > of “role”. An instance of “input” could then have a certain > value (e.g. >15C) in one of its properties? > > In that case, does it make sense to include “input” and > “output” classes > > in the model as subclasses of >“role”? Or is this something that > me and Stephan exemplify in the primer document under “usage of > >agent” (or something of the sort)? > > > I agree with Khalid's example where Role allows us to model more > complex scenarios. For example, X is an instance of class > HumanBeing (perhaps as subclass of entity) and X has multiple > roles - researcher, parent, soccer player etc. To model these > "functions" we will use the Role class. I believe in the > microarray scenario (in your first mail) Roles are not needed. > > > > > In that case, does it make sense to include “input” and > “output” > > classes in the model as >subclasses of “role”? Or is this > something that me and Stephan exemplify in the primer >document > under “usage of agent” (or something of the sort)? > > > Sorry I did not understand this. Role can be used by any entity, > why only "usage of agent"? > > > Thanks. > > > Best, > > Satya > > > On Mon, Aug 15, 2011 at 7:01 AM, Deus, Helena > <helena.deus@deri.org <mailto:helena.deus@deri.org> > <mailto:helena.deus@deri.org <mailto:helena.deus@deri.org>>> wrote: > > Hi Khalid, > > Please see comments inline > > > *From:* Khalid Belhajjame [mailto:Khalid.Belhajjame@cs. > man.ac.uk <mailto:Khalid.Belhajjame@cs.man.ac.uk> > <mailto:Khalid.Belhajjame@cs. man.ac.uk > <mailto:Khalid.Belhajjame@cs.man.ac.uk>>] > > *Sent:* 12 August 2011 10:22 > *To:* Deus, Helena > *Cc:* public-prov-wg@w3.org <mailto:public-prov-wg@w3.org> > <mailto:public-prov-wg@w3.org <mailto:public-prov-wg@w3.org>> > > *Subject:* Re: playing with pil ontology > > > > Hi Helena, > > Thanks for this, I think that this is a good exercise and some > of the point you mentioned relate to the conceptual model, not > only the formal model. > > On 11/08/2011 18:52, Deus, Helena wrote: > > Hi all, > > Reiterating a bit on what was addressed today in the telco, I > downloaded the ontology from mercurial and tried to use it with > my use case. > > I am using the use cases published in [1] and demoed with SPARQL > at http://biordfmicroarray. googlecode.com/hg/sparql_ > endpoint.html > <http://biordfmicroarray.googlecode.com/hg/sparql_endpoint.html> > > > Here is my input so far: > > > Agent could have dataProperty “label” and “description”; it > would help the implementer describe what type of agent does > he/she intend to describe. Is the ontology here being confused > with the query model? > > I think that there was previously a long thread discussion on > agent and agent types, and whether the model should be > prescriptive in this respect. One of the solutions that I think > many people were happy with is to leave users choose their > favorite model(ontology) for agent, which means that the agent > class defined in the ontology acts as a place holder that can be > specialized to include description, types, and whatever the > application needs. > > > I am not questioning whether agent should be mapped to agents > defined elsewhere, which seems to be obvious– only wondering > whether agent “label” and “description” are things we want to > standardize in our model or not. We can “suggest” rdfs:label and > rdfs:comment without enforcing it as such – having those > included in the model will likely result in much less > heterogeneity when it comes to reporting provenance > (particularly since we are defining it necessarily “open” and > highly granular to fit any particular domain. > > > ProvenanceContainer is not useful, or its description is not > clear; what should be an instance of provenanceContainer? > > > At this stage, the description of this concept is not yet stable > in the conceptual model as far as I know. > > > What was its intended purpose/role in the description of provenance? > > > I want to create an instance of a “untransformed” entity (in my > case, a dataset) and a “transformed” entity. Is the model going > to give me that granularity/expressivity or do we expect each > implementer to come up with their own way of defining these? > > Could you please clarify what you mean by transformed and > untransformed entity? > > Example: a list of height measurement is an “untransformed” > entity (a dataset); the average of that list is the > “transformed” entity (another dataset, although a very simple one). > > > I am dealing with much more complex workflows, (e.g. files > containing the outcome of a microarray experiment as the > untransformed dataset and a list of differentially expressed > genes as the transformed dataset), so please take the example > above is just illustrative. > > > ProcessExecution needs more expressivity, I think. Not sure how > to solve this in a domain independent way, but here’s my problem: > > An investigator (agent) performs an experiment > > That experiment has several input parameters, some of which are > entities (e.g. samples), other are not (e.g. temperature). > > Resulting from the experiment are several output parameters > (entities) > > > I think that the current model caters for the above need. If you > are specifically trying to differentiate between different kinds > of inputs (samples as opposed to temperature), then the notion > of role can be helpful in this resepect. > > > So if I understand what you are saying correctly, “temperature” > would be an entity of type “input”, which in turn would be > subclass of “role”. An instance of “input” could then have a > certain value (e.g. 15C) in one of its properties? > > In that case, does it make sense to include “input” and “output” > classes in the model as subclasses of “role”? Or is this > something that me and Stephan exemplify in the primer document > under “usage of agent” (or something of the sort)? > > > > > Thanks, khalid > > > Have not completed my “experiment” yet, but will provide more > feedback soon J > > > Best Regards, > > Helena F. Deus > > Post-doctoral Researcher > Digital Enterprise Research Institute > > National University of Ireland, Galway > > http://lenadeus.info > > > > > > >
Received on Monday, 15 August 2011 21:42:00 UTC