- From: Daniel Garijo <dgarijo@delicias.dia.fi.upm.es>
- Date: Wed, 24 Aug 2011 18:22:12 +0200
- To: Simon Miles <simon.miles@kcl.ac.uk>
- Cc: Provenance Working Group WG <public-prov-wg@w3.org>
- Message-ID: <CAExK0De-Dbkdmq8kLNrDUBA13idmOyrKWXtWEFHRtUtAAy7tSA@mail.gmail.com>
Hi Simon, as I argued with Graham before, I would like to query the provenance containers directly from the place they are stored. If you change the derivation relationship in your example to a new pil:isProvenanceOf, then I'm ok: (1) http://one pil:isProvenanceOf http://two. If not, in your example I don't see how can I recover the provenance containers without getting all the derived resources from the provenance store too. I personally like more the addition of Provenance Container in the domain model rather than isProvenanceOf, but if we decide to add the relationship instead of the class, I can live with that. However right now in the domain model I only see Procenance Container. Best, Daniel 2011/8/24 Simon Miles <simon.miles@kcl.ac.uk> > Hi Luc, > > I'm trying to understand the implication of your distinction below. > > Does this mean that if I have a document containing (only) provenance > assertions which I identify with URI http://one, then I cannot simply > assert the following? > > (1) http://one pil:isDerivedFrom http://two > > What does "provenance [container] can be asserted to be an entity" > entail? Is it enough for me to say the following: > > (2) http://one rdf:type pil:Entity > > and this would then allow me to also say (1)? Or is there something > more required to transform http://one into a PIL entity? > > Thanks, > Simon > > On 23 August 2011 23:20, Luc Moreau <L.Moreau@ecs.soton.ac.uk> wrote: > > Hi Jim and all, > > > > I am picking on this specific message, since it seems to represent an > > idea that is evolving in the WG. > > > > As earlier today, I don't agree with: > > > provenance is 'just another entity' > > > > Adopting the phrasing that Jim used in a previous response, I would > > say that: > > > > provenance [container] can be asserted to be an entity > > > > Luc > > > > On 15/08/11 19:43, Myers, Jim wrote: > >> I agree provenance is 'just another entity' at some level, so perhaps a > subtype of entity (versus a separate concept). The two types of things that > seem different to me from HTML and XML docs are: > >> > >> I would like a PIL interpreter to do something with accounts it gets, > i.e. read them and make the contents accessible. I might also want to keep > the link between a statement and its source 'account, allow streaming > accounts (e.g. from a live sensor), allow multiple accounts in one document, > etc. I suspect that simply standardizing how one indicates that a resource > is of type 'provenance doc' gives a partial solution, but I see an analogy > to the reasoning leading to NamedGraphs as well (why not just have RDF docs? > Do their reasons (related to signing, etc.) apply for us as well? (Such as - > if we want to sign provenance, wouldn't it be nice to put the signature in > the same doc as the provenance statements it refers to? But that changes a > cryptographic signature unless there's a mechanism to specify which parts of > the doc are the account...)). > >> > >> The other type of thing we explored in OPM was in relating accounts - > being able to state that one account was consistent/inconsistent with > another one. If we want that for PIL, we'd need to define a relationship(s) > between accounts (which could still be subtypes of entities). > >> > >> In any case - probably more discussion required - I just didn't want > 'collection' as an aggregate entity concept to get lumped together with the > account/provenance doc/prov container and cause additional confusion. > >> > >> Jim > >> > >> > >> > >>> -----Original Message----- > >>> From: Graham Klyne [mailto:GK@ninebynine.org] > >>> Sent: Monday, August 15, 2011 12:10 PM > >>> To: Myers, Jim > >>> Cc: Satya Sahoo; Deus, Helena; Khalid Belhajjame; > public-prov-wg@w3.org > >>> Subject: Re: playing with pil ontology > >>> > >>> Jim, > >>> > >>> FWIW, in PAQ we talk about "provenance information" as just another > resource > >>> that includes provenance assertions. To my mind, it's primary > representation > >>> would be as an RDF document. > >>> > >>> The terminology here is subject to review and harmonization with the > model, > >>> but I'm not convinced that we need a new concept in the model for this, > and > >>> I'm not keen on a name involving "container", as in my mind that sets > up > >>> expectations of a distinct layer of encapsulation. We don't talk about > >>> "containers" for HTML or XML elements, we just talk about HTML and XML > >>> documents. Same for provenance, IMO. > >>> > >>> I suppose that suggests "Provenance Document", or similar. > >>> > >>> #g > >>> -- > >>> > >>> Myers, Jim wrote: > >>> > >>>> > >>>> A couple quick comments: I don't think we've distinguished provenance > >>>> container and account at this point - they are an entity which > >>>> contains provenance statements and are used to enable you to talk > >>>> about how the provenance was created (what processes and inputs caused > >>>> those statements to be), but collection has been discussed as a > >>>> general aggregate entity/container - a bag of marbles is an entity and > >>>> saying a process execution used it is shorthand for talking about the > >>>> individual marbles. A file is a collection of bytes and a process > >>>> execution may only use some of the bytes, etc. > >>>> > >>>> > >>>> > >>>> Re: roles - I would argue that you should use something quite specific > >>>> for the role of your temperature parameter, e.g. > >>>> "processingtempraturesetpoint' rather than a generic "input" or > >>>> "inputParameter" role (parameter might still be a supertype of > >>>> processingtemperaturesetpoint). This would be necessary if, for > >>>> example, your process execution had a reaction temperature and a > >>>> storage temperature as inputs - now you have two numbers/two > >>>> temperatures and you have to use each in the correct role for the > >>>> provenance to be correct. In many cases, you could potentially > >>>> describe the type of the entity itself well enough to make the > >>>> provenance clear, but putting the information into the entity typing > >>>> rather than into the role it has relative to the process execution > >>>> causes trouble if you use the entity in multiple processes (if I make > >>>> an entity that is of type "processingtemperaturesetpoint" and I have a > >>>> second process that displays a "printablenumber" that uses it as > >>>> input, the same entity can't also be of type "printable number" - > >>>> better to make the entity have type number and play a > >>>> 'processingtemperaturesetpoint" role in one process and the > >>>> "printablenumber" role in the other.) > >>>> > >>>> > >>>> > >>>> Jim > >>>> > >>>> > >>>> > >>>> *From:* public-prov-wg-request@w3.org > >>>> [mailto:public-prov-wg-request@w3.org] *On Behalf Of *Satya Sahoo > >>>> *Sent:* Monday, August 15, 2011 11:02 AM > >>>> *To:* Deus, Helena > >>>> *Cc:* Khalid Belhajjame; public-prov-wg@w3.org > >>>> *Subject:* Re: playing with pil ontology > >>>> > >>>> > >>>> > >>>> Hi Lena, > >>>> > >>>> Thanks again for trying to use the ontology for the microarray use > case! > >>>> > >>>> > >>>> > >>>> My comments are inline: > >>>> > >>>> > >>>> > >>>> >I am not questioning whether agent should be mapped to agents > >>>> defined elsewhere, which seems to>be obvious- only wondering whether > >>>> agent "label" and "description" are things we want to standardize>in > >>>> our model or not. We can "suggest" rdfs:label and rdfs:comment without > >>>> enforcing it as such ->having those included in the model will likely > >>>> result in much less heterogeneity when it comes to>reporting > >>>> provenance (particularly since we are defining it necessarily "open" > >>>> and highly granular to fit>any particular domain. > >>>> > >>>> > >>>> > >>>> I am not sure I understand your point. The rdfs:label and rdfs:comment > >>>> are two of the nine annotation properties that are part of the OWL2 > >>>> syntax. So, the provenance ontology encoded in OWL includes them by > >>>> default. > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > What was its intended purpose/role in the description of > provenance? > >>>> > >>>> > >>>> > >>>> Provenance container, account, and collection are related concepts for > >>>> modeling a collection of provenance assertions. E.g. provenance of a > >>>> Affymetrix gene chip will be a collection of provenance assertions > >>>> (date of manufacture, location of manufacturer, production series > >>>> etc.) that can be stored in a single file and the file will be a > >>>> provenance container. > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>>> Example: a list of height measurement is an "untransformed" entity (a > >>>>> > >>>> dataset); the average of that list>is the "transformed" entity > >>>> (another dataset, although a very simple one). > >>>> > >>>> > >>>>> I am dealing with much more complex workflows, (e.g. files containing > >>>>> > >>>> the outcome of a microarray>experiment as the untransformed dataset > >>>> and a list of differentially expressed genes as the>transformed > >>>> dataset), so please take the example above is just illustrative. > >>>> > >>>> > >>>> > >>>> I am not sure I see the granularity/expressivity issue in the above > >>>> example (from your first mail). Both the "untransformed" and > >>>> "transformed" entities map to input and output data of a process > >>>> execution - we can create subclass of Entity for this purpose. > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>>> An investigator (agent) performs an experiment That experiment has > >>>>> > >>>> several input parameters, some>of which are entities (e.g. samples), > >>>> other are not (e.g. temperature) Resulting from the experiment are > >>>> > >>>>> several output parameters (entities) > >>>>> > >>>> > >>>> > >>>> I am confused by the above scenario. Why is temperature not an entity? > >>>> Both the input (sample) and (temperature) are special types (sub > >>>> class) of entities - (a) InputData and (b) InputParameter etc. > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>>> So if I understand what you are saying correctly, "temperature" would > >>>>> > >>>> be an entity of type "input",>which in turn would be subclass of > >>>> "role". An instance of "input" could then have a certain value (e.g. > >>>> >15C) in one of its properties? > >>>> > >>>> > >>>>> In that case, does it make sense to include "input" and "output" > >>>>> classes > >>>>> > >>>> in the model as subclasses of>"role"? Or is this something that me > >>>> and Stephan exemplify in the primer document under "usage of>agent" > >>>> (or something of the sort)? > >>>> > >>>> > >>>> > >>>> I agree with Khalid's example where Role allows us to model more > >>>> complex scenarios. For example, X is an instance of class HumanBeing > >>>> (perhaps as subclass of entity) and X has multiple roles - researcher, > >>>> parent, soccer player etc. To model these "functions" we will use the > >>>> Role class. I believe in the microarray scenario (in your first mail) > >>>> Roles are not needed. > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>>> In that case, does it make sense to include "input" and "output" > >>>>> > >>>> classes in the model as>subclasses of "role"? Or is this something > >>>> that me and Stephan exemplify in the primer>document under "usage of > >>>> > >>> agent" > >>> > >>>> (or something of the sort)? > >>>> > >>>> > >>>> > >>>> Sorry I did not understand this. Role can be used by any entity, why > >>>> only "usage of agent"? > >>>> > >>>> > >>>> > >>>> Thanks. > >>>> > >>>> > >>>> > >>>> Best, > >>>> > >>>> Satya > >>>> > >>>> > >>>> > >>>> On Mon, Aug 15, 2011 at 7:01 AM, Deus, Helena<helena.deus@deri.org > >>>> <mailto:helena.deus@deri.org>> wrote: > >>>> > >>>> Hi Khalid, > >>>> > >>>> Please see comments inline > >>>> > >>>> > >>>> > >>>> *From:* Khalid Belhajjame [mailto:Khalid.Belhajjame@cs.man.ac.uk > >>>> <mailto:Khalid.Belhajjame@cs.man.ac.uk>] > >>>> *Sent:* 12 August 2011 10:22 > >>>> *To:* Deus, Helena > >>>> *Cc:* public-prov-wg@w3.org<mailto:public-prov-wg@w3.org> > >>>> *Subject:* Re: playing with pil ontology > >>>> > >>>> > >>>> > >>>> > >>>> Hi Helena, > >>>> > >>>> Thanks for this, I think that this is a good exercise and some of the > >>>> point you mentioned relate to the conceptual model, not only the > >>>> formal model. > >>>> > >>>> On 11/08/2011 18:52, Deus, Helena wrote: > >>>> > >>>> Hi all, > >>>> > >>>> Reiterating a bit on what was addressed today in the telco, I > >>>> downloaded the ontology from mercurial and tried to use it with my use > >>>> case. > >>>> > >>>> I am using the use cases published in [1] and demoed with SPARQL at > >>>> http://biordfmicroarray.googlecode.com/hg/sparql_endpoint.html > >>>> > >>>> > >>>> > >>>> Here is my input so far: > >>>> > >>>> > >>>> > >>>> Agent could have dataProperty "label" and "description"; it would help > >>>> the implementer describe what type of agent does he/she intend to > >>>> describe. Is the ontology here being confused with the query model? > >>>> > >>>> I think that there was previously a long thread discussion on agent > >>>> and agent types, and whether the model should be prescriptive in this > >>>> respect. One of the solutions that I think many people were happy with > >>>> is to leave users choose their favorite model(ontology) for agent, > >>>> which means that the agent class defined in the ontology acts as a > >>>> place holder that can be specialized to include description, types, > >>>> and whatever the application needs. > >>>> > >>>> > >>>> > >>>> I am not questioning whether agent should be mapped to agents defined > >>>> elsewhere, which seems to be obvious- only wondering whether agent > >>>> "label" and "description" are things we want to standardize in our > >>>> model or not. We can "suggest" rdfs:label and rdfs:comment without > >>>> enforcing it as such - having those included in the model will likely > >>>> result in much less heterogeneity when it comes to reporting > >>>> provenance (particularly since we are defining it necessarily "open" > >>>> and highly granular to fit any particular domain. > >>>> > >>>> > >>>> > >>>> ProvenanceContainer is not useful, or its description is not clear; > >>>> what should be an instance of provenanceContainer? > >>>> > >>>> > >>>> At this stage, the description of this concept is not yet stable in > >>>> the conceptual model as far as I know. > >>>> > >>>> > >>>> > >>>> What was its intended purpose/role in the description of provenance? > >>>> > >>>> > >>>> > >>>> I want to create an instance of a "untransformed" entity (in my case, > >>>> a > >>>> dataset) and a "transformed" entity. Is the model going to give me > >>>> that granularity/expressivity or do we expect each implementer to come > >>>> up with their own way of defining these? > >>>> > >>>> Could you please clarify what you mean by transformed and > >>>> untransformed entity? > >>>> > >>>> Example: a list of height measurement is an "untransformed" entity (a > >>>> dataset); the average of that list is the "transformed" entity > >>>> (another dataset, although a very simple one). > >>>> > >>>> > >>>> > >>>> I am dealing with much more complex workflows, (e.g. files containing > >>>> the outcome of a microarray experiment as the untransformed dataset > >>>> and a list of differentially expressed genes as the transformed > >>>> dataset), so please take the example above is just illustrative. > >>>> > >>>> > >>>> > >>>> ProcessExecution needs more expressivity, I think. Not sure how to > >>>> solve this in a domain independent way, but here's my problem: > >>>> > >>>> An investigator (agent) performs an experiment > >>>> > >>>> That experiment has several input parameters, some of which are > >>>> entities (e.g. samples), other are not (e.g. temperature). > >>>> > >>>> Resulting from the experiment are several output parameters (entities) > >>>> > >>>> > >>>> I think that the current model caters for the above need. If you are > >>>> specifically trying to differentiate between different kinds of inputs > >>>> (samples as opposed to temperature), then the notion of role can be > >>>> helpful in this resepect. > >>>> > >>>> > >>>> > >>>> So if I understand what you are saying correctly, "temperature" would > >>>> be an entity of type "input", which in turn would be subclass of > >>>> "role". An instance of "input" could then have a certain value (e.g. > >>>> 15C) in one of its properties? > >>>> > >>>> In that case, does it make sense to include "input" and "output" > >>>> classes in the model as subclasses of "role"? Or is this something > >>>> that me and Stephan exemplify in the primer document under "usage of > >>>> agent" (or something of the sort)? > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> Thanks, khalid > >>>> > >>>> > >>>> > >>>> Have not completed my "experiment" yet, but will provide more feedback > >>>> soon J > >>>> > >>>> > >>>> > >>>> Best Regards, > >>>> > >>>> Helena F. Deus > >>>> > >>>> Post-doctoral Researcher > >>>> Digital Enterprise Research Institute > >>>> > >>>> National University of Ireland, Galway > >>>> > >>>> http://lenadeus.info > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >> > >> > > > > > > ______________________________________________________________________ > > This email has been scanned by the MessageLabs Email Security System. > > For more information please visit http://www.messagelabs.com/email > > ______________________________________________________________________ > > > > > > -- > Dr Simon Miles > Lecturer, Department of Informatics > Kings College London, WC2R 2LS, UK > +44 (0)20 7848 1166 > >
Received on Wednesday, 24 August 2011 16:22:41 UTC