- From: Daniel Garijo <dgarijo@delicias.dia.fi.upm.es>
- Date: Mon, 15 Aug 2011 15:39:00 -0700
- To: Graham Klyne <GK@ninebynine.org>
- Cc: "Myers, Jim" <MYERSJ4@rpi.edu>, Satya Sahoo <satya.sahoo@case.edu>, "Deus, Helena" <helena.deus@deri.org>, Khalid Belhajjame <Khalid.Belhajjame@cs.man.ac.uk>, "public-prov-wg@w3.org" <public-prov-wg@w3.org>
- Message-ID: <CAExK0DemoxLk8ehyhM6kHAtkynf0CCjzxt8F90-mzBPnB8ujmw@mail.gmail.com>
Yes, I was thinking about named graphs for grouping the provenance descriptions. However, I do think that the model should recognize explicitly the "provenance container" (or whatever we decide to name it in the end), so I could select the provenance containers having statements referring to a resource and filter them depending on a certain constraint (like author or date of creation). Best, Daniel 2011/8/15 Graham Klyne <GK@ninebynine.org> > Daniel, > > It sounds to me as if you're trying to subdivide web resources, and that > seems to me like a potential lot of complexity for questionable gain. > > (If you're thinking of something like named graphs in an RDF document, then > fine: here each of the graphs has its own URI, so for descriptive purposes > can be treated as a separate web resource. I don't think this is something > the model needs to explicitly recognize, as it amounts to an implementation > detail.) > > #g > -- > > Daniel Garijo wrote: > >> Hi Graham, >> I like Provenance Container. What if your provenance statements were >> created by different persons, >> processes or at different times, but they are within the same Provenance >> Document >> (since they are provenance assertions about the same entity)? I may want >> to describe the different >> provenance containers, or even the provenance container descriptions with >> another one. >> >> Thanks, >> Daniel >> >> 2011/8/15 Graham Klyne <GK@ninebynine.org <mailto:GK@ninebynine.org>> >> >> >> Jim, >> >> FWIW, in PAQ we talk about "provenance information" as just another >> resource that includes provenance assertions. To my mind, it's >> primary representation would be as an RDF document. >> >> The terminology here is subject to review and harmonization with the >> model, but I'm not convinced that we need a new concept in the model >> for this, and I'm not keen on a name involving "container", as in my >> mind that sets up expectations of a distinct layer of encapsulation. >> We don't talk about "containers" for HTML or XML elements, we just >> talk about HTML and XML documents. Same for provenance, IMO. >> >> I suppose that suggests "Provenance Document", or similar. >> >> #g >> -- >> >> Myers, Jim wrote: >> >> >> >> A couple quick comments: I don’t think we’ve distinguished >> provenance container and account at this point – they are an >> entity which contains provenance statements and are used to >> enable you to talk about how the provenance was created (what >> processes and inputs caused those statements to be), but >> collection has been discussed as a general aggregate >> entity/container – a bag of marbles is an entity and saying a >> process execution used it is shorthand for talking about the >> individual marbles. A file is a collection of bytes and a >> process execution may only use some of the bytes, etc. >> >> Re: roles – I would argue that you should use something >> quite >> specific for the role of your temperature parameter, e.g. >> “processingtempraturesetpoint’ rather than a generic “input” or >> “inputParameter” role (parameter might still be a supertype of >> processingtemperaturesetpoint) . This would be necessary if, for >> example, your process execution had a reaction temperature and a >> storage temperature as inputs – now you have two numbers/two >> temperatures and you have to use each in the correct role for >> the provenance to be correct. In many cases, you could >> potentially describe the type of the entity itself well enough >> to make the provenance clear, but putting the information into >> the entity typing rather than into the role it has relative to >> the process execution causes trouble if you use the entity in >> multiple processes (if I make an entity that is of type “ >> processingtemperaturesetpoint” and I have a second process that >> displays a “printablenumber” that uses it as input, the same >> entity can’t also be of type “printable number” – better to make >> the entity have type number and play a ‘ >> processingtemperaturesetpoint” role in one process and the >> “printablenumber” role in the other.) >> >> Jim >> >> *From:* public-prov-wg-request@w3.org >> <mailto:public-prov-wg-**request@w3.org<public-prov-wg-request@w3.org>> >> [mailto:public-prov-wg- >> request@w3.org <mailto:public-prov-wg-**request@w3.org<public-prov-wg-request@w3.org>>] >> *On >> >> Behalf Of *Satya Sahoo >> *Sent:* Monday, August 15, 2011 11:02 AM >> *To:* Deus, Helena >> *Cc:* Khalid Belhajjame; public-prov-wg@w3.org >> <mailto:public-prov-wg@w3.org> >> *Subject:* Re: playing with pil ontology >> >> Hi Lena, >> >> Thanks again for trying to use the ontology for the microarray >> use case! >> My comments are inline: >> >> >I am not questioning whether agent should be mapped to >> agents >> defined elsewhere, which seems to >be obvious– only wondering >> whether agent “label” and “description” are things we want to >> standardize >in our model or not. We can “suggest” rdfs:label >> and rdfs:comment without enforcing it as such – >having those >> included in the model will likely result in much less >> heterogeneity when it comes to >reporting provenance >> (particularly since we are defining it necessarily “open” and >> highly granular to fit >any particular domain. >> >> I am not sure I understand your point. The rdfs:label and >> rdfs:comment are two of the nine annotation properties that are >> part of the OWL2 syntax. So, the provenance ontology encoded in >> OWL includes them by default. >> >> >> > What was its intended purpose/role in the description of >> provenance? >> >> Provenance container, account, and collection are related >> concepts for modeling a collection of provenance assertions. >> E.g. provenance of a Affymetrix gene chip will be a collection >> of provenance assertions (date of manufacture, location of >> manufacturer, production series etc.) that can be stored in a >> single file and the file will be a provenance container. >> >> Example: a list of height measurement is an “untransformed” >> entity (a >> >> dataset); the average of that list >is the “transformed” entity >> (another dataset, although a very simple one). >> >> I am dealing with much more complex workflows, (e.g. files >> containing >> >> the outcome of a microarray >experiment as the untransformed >> dataset and a list of differentially expressed genes as the >> >transformed dataset), so please take the example above is just >> illustrative. >> >> I am not sure I see the granularity/expressivity issue in >> the >> above example (from your first mail). Both the "untransformed" >> and "transformed" entities map to input and output data of a >> process execution - we can create subclass of Entity for this >> purpose. >> >> >> An investigator (agent) performs an experiment That >> experiment has >> >> several input parameters, some >of which are entities (e.g. >> samples), other are not (e.g. temperature) Resulting from the >> experiment are >several output parameters (entities) >> >> I am confused by the above scenario. Why is temperature >> not an >> entity? Both the input (sample) and (temperature) are special >> types (sub class) of entities - (a) InputData and (b) >> InputParameter etc. >> >> >> So if I understand what you are saying correctly, >> “temperature” would >> >> be an entity of type “input”, >which in turn would be subclass >> of “role”. An instance of “input” could then have a certain >> value (e.g. >15C) in one of its properties? >> >> In that case, does it make sense to include “input” and >> “output” classes >> >> in the model as subclasses of >“role”? Or is this something that >> me and Stephan exemplify in the primer document under “usage of >> >agent” (or something of the sort)? >> >> I agree with Khalid's example where Role allows us to >> model more >> complex scenarios. For example, X is an instance of class >> HumanBeing (perhaps as subclass of entity) and X has multiple >> roles - researcher, parent, soccer player etc. To model these >> "functions" we will use the Role class. I believe in the >> microarray scenario (in your first mail) Roles are not needed. >> >> >> In that case, does it make sense to include “input” and >> “output” >> >> classes in the model as >subclasses of “role”? Or is this >> something that me and Stephan exemplify in the primer >document >> under “usage of agent” (or something of the sort)? >> >> Sorry I did not understand this. Role can be used by any >> entity, >> why only "usage of agent"? >> >> Thanks. >> >> Best, >> >> Satya >> >> On Mon, Aug 15, 2011 at 7:01 AM, Deus, Helena >> <helena.deus@deri.org <mailto:helena.deus@deri.org> >> <mailto:helena.deus@deri.org <mailto:helena.deus@deri.org>>**> >> wrote: >> >> Hi Khalid, >> >> Please see comments inline >> >> *From:* Khalid Belhajjame [mailto:Khalid.Belhajjame@cs. >> man.ac.uk <mailto:Khalid.Belhajjame@cs.**man.ac.uk<Khalid.Belhajjame@cs.man.ac.uk> >> > >> <mailto:Khalid.Belhajjame@cs. man.ac.uk >> <mailto:Khalid.Belhajjame@cs.**man.ac.uk<Khalid.Belhajjame@cs.man.ac.uk> >> >>] >> >> *Sent:* 12 August 2011 10:22 >> *To:* Deus, Helena >> *Cc:* public-prov-wg@w3.org <mailto:public-prov-wg@w3.org> >> <mailto:public-prov-wg@w3.org <mailto:public-prov-wg@w3.org>**> >> >> >> *Subject:* Re: playing with pil ontology >> >> >> Hi Helena, >> >> Thanks for this, I think that this is a good exercise and some >> of the point you mentioned relate to the conceptual model, not >> only the formal model. >> >> On 11/08/2011 18:52, Deus, Helena wrote: >> >> Hi all, >> >> Reiterating a bit on what was addressed today in the telco, I >> downloaded the ontology from mercurial and tried to use it with >> my use case. >> >> I am using the use cases published in [1] and demoed with SPARQL >> at http://biordfmicroarray. googlecode.com/hg/sparql_ >> endpoint.html >> <http://biordfmicroarray.**googlecode.com/hg/sparql_** >> endpoint.html<http://biordfmicroarray.googlecode.com/hg/sparql_endpoint.html> >> > >> >> Here is my input so far: >> >> Agent could have dataProperty “label” and “description”; >> it >> would help the implementer describe what type of agent does >> he/she intend to describe. Is the ontology here being confused >> with the query model? >> >> I think that there was previously a long thread discussion on >> agent and agent types, and whether the model should be >> prescriptive in this respect. One of the solutions that I think >> many people were happy with is to leave users choose their >> favorite model(ontology) for agent, which means that the agent >> class defined in the ontology acts as a place holder that can be >> specialized to include description, types, and whatever the >> application needs. >> >> I am not questioning whether agent should be mapped to >> agents >> defined elsewhere, which seems to be obvious– only wondering >> whether agent “label” and “description” are things we want to >> standardize in our model or not. We can “suggest” rdfs:label and >> rdfs:comment without enforcing it as such – having those >> included in the model will likely result in much less >> heterogeneity when it comes to reporting provenance >> (particularly since we are defining it necessarily “open” and >> highly granular to fit any particular domain. >> >> ProvenanceContainer is not useful, or its description is >> not >> clear; what should be an instance of provenanceContainer? >> >> >> At this stage, the description of this concept is not yet stable >> in the conceptual model as far as I know. >> >> What was its intended purpose/role in the description of >> provenance? >> >> I want to create an instance of a “untransformed” entity >> (in my >> case, a dataset) and a “transformed” entity. Is the model going >> to give me that granularity/expressivity or do we expect each >> implementer to come up with their own way of defining these? >> >> Could you please clarify what you mean by transformed and >> untransformed entity? >> >> Example: a list of height measurement is an “untransformed” >> entity (a dataset); the average of that list is the >> “transformed” entity (another dataset, although a very simple one). >> >> I am dealing with much more complex workflows, (e.g. files >> containing the outcome of a microarray experiment as the >> untransformed dataset and a list of differentially expressed >> genes as the transformed dataset), so please take the example >> above is just illustrative. >> >> ProcessExecution needs more expressivity, I think. Not >> sure how >> to solve this in a domain independent way, but here’s my problem: >> >> An investigator (agent) performs an experiment >> >> That experiment has several input parameters, some of which are >> entities (e.g. samples), other are not (e.g. temperature). >> >> Resulting from the experiment are several output parameters >> (entities) >> >> >> I think that the current model caters for the above need. If you >> are specifically trying to differentiate between different kinds >> of inputs (samples as opposed to temperature), then the notion >> of role can be helpful in this resepect. >> >> So if I understand what you are saying correctly, >> “temperature” >> would be an entity of type “input”, which in turn would be >> subclass of “role”. An instance of “input” could then have a >> certain value (e.g. 15C) in one of its properties? >> >> In that case, does it make sense to include “input” and “output” >> classes in the model as subclasses of “role”? Or is this >> something that me and Stephan exemplify in the primer document >> under “usage of agent” (or something of the sort)? >> >> >> >> Thanks, khalid >> >> Have not completed my “experiment” yet, but will provide >> more >> feedback soon J >> >> Best Regards, >> >> Helena F. Deus >> >> Post-doctoral Researcher >> Digital Enterprise Research Institute >> >> National University of Ireland, Galway >> >> http://lenadeus.info >> >> >> >> >> >> > >
Received on Monday, 15 August 2011 22:39:29 UTC