- From: Graham Klyne <GK@ninebynine.org>
- Date: Thu, 25 Aug 2011 15:53:15 +0100
- To: Luc Moreau <L.Moreau@ecs.soton.ac.uk>
- CC: public-prov-wg@w3.org
The implication that you cannot express (1) without first asserting (2) is inconsistent with RDF semantics. i.e. under any interpretation for which this RDF expression is true: [[ <http://one> pil:isDerivedFrom <http://two> . <http://one> rdf:type pil:Entity . ]] the following is also true: [[ <http://one> pil:isDerivedFrom <http://two> . ]] Removing the rdf:type assertion cannot change this. #g -- On 25/08/2011 10:33, Luc Moreau wrote: > Hi Simon, > > You need to assert (2), which then allows you to express (1). > > I tried to introduce a container in the file example. See [1]. > In the provenance container found at [1], I have embedded some provenance. > (It could have been kept separate, using the PAQ mechanism) > > The notion of scope is becoming crucial, and we need to address it (see ISSUE-81). > > Cheers, > Luc > > [1] http://dvcs.w3.org/hg/prov/raw-file/default/model/container-example.pasn > > On 24/08/11 09:56, Simon Miles wrote: >> Hi Luc, >> >> I'm trying to understand the implication of your distinction below. >> >> Does this mean that if I have a document containing (only) provenance >> assertions which I identify with URI http://one, then I cannot simply >> assert the following? >> >> (1) http://one pil:isDerivedFrom http://two >> >> What does "provenance [container] can be asserted to be an entity" >> entail? Is it enough for me to say the following: >> >> (2) http://one rdf:type pil:Entity >> >> and this would then allow me to also say (1)? Or is there something >> more required to transform http://one into a PIL entity? >> >> Thanks, >> Simon >> >> On 23 August 2011 23:20, Luc Moreau<L.Moreau@ecs.soton.ac.uk> wrote: >>> Hi Jim and all, >>> >>> I am picking on this specific message, since it seems to represent an >>> idea that is evolving in the WG. >>> >>> As earlier today, I don't agree with: >>> > provenance is 'just another entity' >>> >>> Adopting the phrasing that Jim used in a previous response, I would >>> say that: >>> >>> provenance [container] can be asserted to be an entity >>> >>> Luc >>> >>> On 15/08/11 19:43, Myers, Jim wrote: >>>> I agree provenance is 'just another entity' at some level, so perhaps a >>>> subtype of entity (versus a separate concept). The two types of things that >>>> seem different to me from HTML and XML docs are: >>>> >>>> I would like a PIL interpreter to do something with accounts it gets, i.e. >>>> read them and make the contents accessible. I might also want to keep the >>>> link between a statement and its source 'account, allow streaming accounts >>>> (e.g. from a live sensor), allow multiple accounts in one document, etc. I >>>> suspect that simply standardizing how one indicates that a resource is of >>>> type 'provenance doc' gives a partial solution, but I see an analogy to the >>>> reasoning leading to NamedGraphs as well (why not just have RDF docs? Do >>>> their reasons (related to signing, etc.) apply for us as well? (Such as - if >>>> we want to sign provenance, wouldn't it be nice to put the signature in the >>>> same doc as the provenance statements it refers to? But that changes a >>>> cryptographic signature unless there's a mechanism to specify which parts of >>>> the doc are the account...)). >>>> >>>> The other type of thing we explored in OPM was in relating accounts - being >>>> able to state that one account was consistent/inconsistent with another one. >>>> If we want that for PIL, we'd need to define a relationship(s) between >>>> accounts (which could still be subtypes of entities). >>>> >>>> In any case - probably more discussion required - I just didn't want >>>> 'collection' as an aggregate entity concept to get lumped together with the >>>> account/provenance doc/prov container and cause additional confusion. >>>> >>>> Jim >>>> >>>> >>>> >>>>> -----Original Message----- >>>>> From: Graham Klyne [mailto:GK@ninebynine.org] >>>>> Sent: Monday, August 15, 2011 12:10 PM >>>>> To: Myers, Jim >>>>> Cc: Satya Sahoo; Deus, Helena; Khalid Belhajjame; public-prov-wg@w3.org >>>>> Subject: Re: playing with pil ontology >>>>> >>>>> Jim, >>>>> >>>>> FWIW, in PAQ we talk about "provenance information" as just another resource >>>>> that includes provenance assertions. To my mind, it's primary representation >>>>> would be as an RDF document. >>>>> >>>>> The terminology here is subject to review and harmonization with the model, >>>>> but I'm not convinced that we need a new concept in the model for this, and >>>>> I'm not keen on a name involving "container", as in my mind that sets up >>>>> expectations of a distinct layer of encapsulation. We don't talk about >>>>> "containers" for HTML or XML elements, we just talk about HTML and XML >>>>> documents. Same for provenance, IMO. >>>>> >>>>> I suppose that suggests "Provenance Document", or similar. >>>>> >>>>> #g >>>>> -- >>>>> >>>>> Myers, Jim wrote: >>>>> >>>>>> A couple quick comments: I don't think we've distinguished provenance >>>>>> container and account at this point - they are an entity which >>>>>> contains provenance statements and are used to enable you to talk >>>>>> about how the provenance was created (what processes and inputs caused >>>>>> those statements to be), but collection has been discussed as a >>>>>> general aggregate entity/container - a bag of marbles is an entity and >>>>>> saying a process execution used it is shorthand for talking about the >>>>>> individual marbles. A file is a collection of bytes and a process >>>>>> execution may only use some of the bytes, etc. >>>>>> >>>>>> >>>>>> >>>>>> Re: roles - I would argue that you should use something quite specific >>>>>> for the role of your temperature parameter, e.g. >>>>>> "processingtempraturesetpoint' rather than a generic "input" or >>>>>> "inputParameter" role (parameter might still be a supertype of >>>>>> processingtemperaturesetpoint). This would be necessary if, for >>>>>> example, your process execution had a reaction temperature and a >>>>>> storage temperature as inputs - now you have two numbers/two >>>>>> temperatures and you have to use each in the correct role for the >>>>>> provenance to be correct. In many cases, you could potentially >>>>>> describe the type of the entity itself well enough to make the >>>>>> provenance clear, but putting the information into the entity typing >>>>>> rather than into the role it has relative to the process execution >>>>>> causes trouble if you use the entity in multiple processes (if I make >>>>>> an entity that is of type "processingtemperaturesetpoint" and I have a >>>>>> second process that displays a "printablenumber" that uses it as >>>>>> input, the same entity can't also be of type "printable number" - >>>>>> better to make the entity have type number and play a >>>>>> 'processingtemperaturesetpoint" role in one process and the >>>>>> "printablenumber" role in the other.) >>>>>> >>>>>> >>>>>> >>>>>> Jim >>>>>> >>>>>> >>>>>> >>>>>> *From:* public-prov-wg-request@w3.org >>>>>> [mailto:public-prov-wg-request@w3.org] *On Behalf Of *Satya Sahoo >>>>>> *Sent:* Monday, August 15, 2011 11:02 AM >>>>>> *To:* Deus, Helena >>>>>> *Cc:* Khalid Belhajjame; public-prov-wg@w3.org >>>>>> *Subject:* Re: playing with pil ontology >>>>>> >>>>>> >>>>>> >>>>>> Hi Lena, >>>>>> >>>>>> Thanks again for trying to use the ontology for the microarray use case! >>>>>> >>>>>> >>>>>> >>>>>> My comments are inline: >>>>>> >>>>>> >>>>>> >>>>>> >I am not questioning whether agent should be mapped to agents >>>>>> defined elsewhere, which seems to>be obvious- only wondering whether >>>>>> agent "label" and "description" are things we want to standardize>in >>>>>> our model or not. We can "suggest" rdfs:label and rdfs:comment without >>>>>> enforcing it as such ->having those included in the model will likely >>>>>> result in much less heterogeneity when it comes to>reporting >>>>>> provenance (particularly since we are defining it necessarily "open" >>>>>> and highly granular to fit>any particular domain. >>>>>> >>>>>> >>>>>> >>>>>> I am not sure I understand your point. The rdfs:label and rdfs:comment >>>>>> are two of the nine annotation properties that are part of the OWL2 >>>>>> syntax. So, the provenance ontology encoded in OWL includes them by >>>>>> default. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> > What was its intended purpose/role in the description of provenance? >>>>>> >>>>>> >>>>>> >>>>>> Provenance container, account, and collection are related concepts for >>>>>> modeling a collection of provenance assertions. E.g. provenance of a >>>>>> Affymetrix gene chip will be a collection of provenance assertions >>>>>> (date of manufacture, location of manufacturer, production series >>>>>> etc.) that can be stored in a single file and the file will be a >>>>>> provenance container. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Example: a list of height measurement is an "untransformed" entity (a >>>>>>> >>>>>> dataset); the average of that list>is the "transformed" entity >>>>>> (another dataset, although a very simple one). >>>>>> >>>>>> >>>>>>> I am dealing with much more complex workflows, (e.g. files containing >>>>>>> >>>>>> the outcome of a microarray>experiment as the untransformed dataset >>>>>> and a list of differentially expressed genes as the>transformed >>>>>> dataset), so please take the example above is just illustrative. >>>>>> >>>>>> >>>>>> >>>>>> I am not sure I see the granularity/expressivity issue in the above >>>>>> example (from your first mail). Both the "untransformed" and >>>>>> "transformed" entities map to input and output data of a process >>>>>> execution - we can create subclass of Entity for this purpose. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> An investigator (agent) performs an experiment That experiment has >>>>>>> >>>>>> several input parameters, some>of which are entities (e.g. samples), >>>>>> other are not (e.g. temperature) Resulting from the experiment are >>>>>> >>>>>>> several output parameters (entities) >>>>>>> >>>>>> >>>>>> I am confused by the above scenario. Why is temperature not an entity? >>>>>> Both the input (sample) and (temperature) are special types (sub >>>>>> class) of entities - (a) InputData and (b) InputParameter etc. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> So if I understand what you are saying correctly, "temperature" would >>>>>>> >>>>>> be an entity of type "input",>which in turn would be subclass of >>>>>> "role". An instance of "input" could then have a certain value (e.g. >>>>>> >15C) in one of its properties? >>>>>> >>>>>> >>>>>>> In that case, does it make sense to include "input" and "output" >>>>>>> classes >>>>>>> >>>>>> in the model as subclasses of>"role"? Or is this something that me >>>>>> and Stephan exemplify in the primer document under "usage of>agent" >>>>>> (or something of the sort)? >>>>>> >>>>>> >>>>>> >>>>>> I agree with Khalid's example where Role allows us to model more >>>>>> complex scenarios. For example, X is an instance of class HumanBeing >>>>>> (perhaps as subclass of entity) and X has multiple roles - researcher, >>>>>> parent, soccer player etc. To model these "functions" we will use the >>>>>> Role class. I believe in the microarray scenario (in your first mail) >>>>>> Roles are not needed. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> In that case, does it make sense to include "input" and "output" >>>>>>> >>>>>> classes in the model as>subclasses of "role"? Or is this something >>>>>> that me and Stephan exemplify in the primer>document under "usage of >>>>>> >>>>> agent" >>>>> >>>>>> (or something of the sort)? >>>>>> >>>>>> >>>>>> >>>>>> Sorry I did not understand this. Role can be used by any entity, why >>>>>> only "usage of agent"? >>>>>> >>>>>> >>>>>> >>>>>> Thanks. >>>>>> >>>>>> >>>>>> >>>>>> Best, >>>>>> >>>>>> Satya >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Aug 15, 2011 at 7:01 AM, Deus, Helena<helena.deus@deri.org >>>>>> <mailto:helena.deus@deri.org>> wrote: >>>>>> >>>>>> Hi Khalid, >>>>>> >>>>>> Please see comments inline >>>>>> >>>>>> >>>>>> >>>>>> *From:* Khalid Belhajjame [mailto:Khalid.Belhajjame@cs.man.ac.uk >>>>>> <mailto:Khalid.Belhajjame@cs.man.ac.uk>] >>>>>> *Sent:* 12 August 2011 10:22 >>>>>> *To:* Deus, Helena >>>>>> *Cc:* public-prov-wg@w3.org<mailto:public-prov-wg@w3.org> >>>>>> *Subject:* Re: playing with pil ontology >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Hi Helena, >>>>>> >>>>>> Thanks for this, I think that this is a good exercise and some of the >>>>>> point you mentioned relate to the conceptual model, not only the >>>>>> formal model. >>>>>> >>>>>> On 11/08/2011 18:52, Deus, Helena wrote: >>>>>> >>>>>> Hi all, >>>>>> >>>>>> Reiterating a bit on what was addressed today in the telco, I >>>>>> downloaded the ontology from mercurial and tried to use it with my use >>>>>> case. >>>>>> >>>>>> I am using the use cases published in [1] and demoed with SPARQL at >>>>>> http://biordfmicroarray.googlecode.com/hg/sparql_endpoint.html >>>>>> >>>>>> >>>>>> >>>>>> Here is my input so far: >>>>>> >>>>>> >>>>>> >>>>>> Agent could have dataProperty "label" and "description"; it would help >>>>>> the implementer describe what type of agent does he/she intend to >>>>>> describe. Is the ontology here being confused with the query model? >>>>>> >>>>>> I think that there was previously a long thread discussion on agent >>>>>> and agent types, and whether the model should be prescriptive in this >>>>>> respect. One of the solutions that I think many people were happy with >>>>>> is to leave users choose their favorite model(ontology) for agent, >>>>>> which means that the agent class defined in the ontology acts as a >>>>>> place holder that can be specialized to include description, types, >>>>>> and whatever the application needs. >>>>>> >>>>>> >>>>>> >>>>>> I am not questioning whether agent should be mapped to agents defined >>>>>> elsewhere, which seems to be obvious- only wondering whether agent >>>>>> "label" and "description" are things we want to standardize in our >>>>>> model or not. We can "suggest" rdfs:label and rdfs:comment without >>>>>> enforcing it as such - having those included in the model will likely >>>>>> result in much less heterogeneity when it comes to reporting >>>>>> provenance (particularly since we are defining it necessarily "open" >>>>>> and highly granular to fit any particular domain. >>>>>> >>>>>> >>>>>> >>>>>> ProvenanceContainer is not useful, or its description is not clear; >>>>>> what should be an instance of provenanceContainer? >>>>>> >>>>>> >>>>>> At this stage, the description of this concept is not yet stable in >>>>>> the conceptual model as far as I know. >>>>>> >>>>>> >>>>>> >>>>>> What was its intended purpose/role in the description of provenance? >>>>>> >>>>>> >>>>>> >>>>>> I want to create an instance of a "untransformed" entity (in my case, >>>>>> a >>>>>> dataset) and a "transformed" entity. Is the model going to give me >>>>>> that granularity/expressivity or do we expect each implementer to come >>>>>> up with their own way of defining these? >>>>>> >>>>>> Could you please clarify what you mean by transformed and >>>>>> untransformed entity? >>>>>> >>>>>> Example: a list of height measurement is an "untransformed" entity (a >>>>>> dataset); the average of that list is the "transformed" entity >>>>>> (another dataset, although a very simple one). >>>>>> >>>>>> >>>>>> >>>>>> I am dealing with much more complex workflows, (e.g. files containing >>>>>> the outcome of a microarray experiment as the untransformed dataset >>>>>> and a list of differentially expressed genes as the transformed >>>>>> dataset), so please take the example above is just illustrative. >>>>>> >>>>>> >>>>>> >>>>>> ProcessExecution needs more expressivity, I think. Not sure how to >>>>>> solve this in a domain independent way, but here's my problem: >>>>>> >>>>>> An investigator (agent) performs an experiment >>>>>> >>>>>> That experiment has several input parameters, some of which are >>>>>> entities (e.g. samples), other are not (e.g. temperature). >>>>>> >>>>>> Resulting from the experiment are several output parameters (entities) >>>>>> >>>>>> >>>>>> I think that the current model caters for the above need. If you are >>>>>> specifically trying to differentiate between different kinds of inputs >>>>>> (samples as opposed to temperature), then the notion of role can be >>>>>> helpful in this resepect. >>>>>> >>>>>> >>>>>> >>>>>> So if I understand what you are saying correctly, "temperature" would >>>>>> be an entity of type "input", which in turn would be subclass of >>>>>> "role". An instance of "input" could then have a certain value (e.g. >>>>>> 15C) in one of its properties? >>>>>> >>>>>> In that case, does it make sense to include "input" and "output" >>>>>> classes in the model as subclasses of "role"? Or is this something >>>>>> that me and Stephan exemplify in the primer document under "usage of >>>>>> agent" (or something of the sort)? >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Thanks, khalid >>>>>> >>>>>> >>>>>> >>>>>> Have not completed my "experiment" yet, but will provide more feedback >>>>>> soon J >>>>>> >>>>>> >>>>>> >>>>>> Best Regards, >>>>>> >>>>>> Helena F. Deus >>>>>> >>>>>> Post-doctoral Researcher >>>>>> Digital Enterprise Research Institute >>>>>> >>>>>> National University of Ireland, Galway >>>>>> >>>>>> http://lenadeus.info >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>> >>> >>> ______________________________________________________________________ >>> This email has been scanned by the MessageLabs Email Security System. >>> For more information please visit http://www.messagelabs.com/email >>> ______________________________________________________________________ >>> >> >> >
Received on Thursday, 25 August 2011 14:55:31 UTC