Re: playing with pil ontology

Hi Simon,
as I argued with Graham before, I would like to query the provenance
containers directly
from the place they are stored. If you change the derivation relationship in
your example
to a new pil:isProvenanceOf, then I'm ok: (1) http://one pil:isProvenanceOf
http://two.

If not, in your example I don't see how can I recover the provenance
containers without
getting all the derived resources from the provenance store too.

I personally like more the addition of Provenance Container in the domain
model rather
than isProvenanceOf, but if we decide to add the relationship instead of the
class, I can live with that.
However right now in the domain model I only see Procenance Container.

Best,
Daniel

2011/8/24 Simon Miles <simon.miles@kcl.ac.uk>

> Hi Luc,
>
> I'm trying to understand the implication of your distinction below.
>
> Does this mean that if I have a document containing (only) provenance
> assertions which I identify with URI http://one, then I cannot simply
> assert the following?
>
> (1) http://one pil:isDerivedFrom http://two
>
> What does "provenance [container] can be asserted to be an entity"
> entail?  Is it enough for me to say the following:
>
> (2) http://one rdf:type pil:Entity
>
> and this would then allow me to also say (1)? Or is there something
> more required to transform http://one into a PIL entity?
>
> Thanks,
> Simon
>
> On 23 August 2011 23:20, Luc Moreau <L.Moreau@ecs.soton.ac.uk> wrote:
> > Hi Jim and all,
> >
> > I am picking on this specific message, since it seems to represent an
> > idea that is evolving in the WG.
> >
> > As earlier today, I don't agree with:
> >  > provenance is 'just another entity'
> >
> > Adopting the phrasing that Jim used in a previous response, I would
> > say that:
> >
> >  provenance [container] can be asserted to be an entity
> >
> > Luc
> >
> > On 15/08/11 19:43, Myers, Jim wrote:
> >> I agree provenance is 'just another entity' at some level, so perhaps a
> subtype of entity (versus a separate concept). The two types of things that
> seem different to me from HTML and XML docs are:
> >>
> >> I would like a PIL interpreter to do something with accounts it gets,
> i.e. read them and make the contents accessible. I might also want to keep
> the link between a statement and its source 'account, allow streaming
> accounts (e.g. from a live sensor), allow multiple accounts in one document,
> etc. I suspect that simply standardizing how one indicates that a resource
> is of type 'provenance doc' gives a partial solution, but I see an analogy
> to the reasoning leading to NamedGraphs as well (why not just have RDF docs?
> Do their reasons (related to signing, etc.) apply for us as well? (Such as -
> if we want to sign provenance, wouldn't it be nice to put the signature in
> the same doc as the provenance statements it refers to? But that changes a
> cryptographic signature unless there's a mechanism to specify which parts of
> the doc are the account...)).
> >>
> >> The other type of thing we explored in OPM was in relating accounts -
> being able to state that one account was consistent/inconsistent with
> another one. If we want that for PIL, we'd need to define a relationship(s)
> between accounts (which could still be subtypes of entities).
> >>
> >> In any case - probably more discussion required - I just didn't want
> 'collection' as an aggregate entity concept to get lumped together with the
> account/provenance doc/prov container and cause additional confusion.
> >>
> >>   Jim
> >>
> >>
> >>
> >>> -----Original Message-----
> >>> From: Graham Klyne [mailto:GK@ninebynine.org]
> >>> Sent: Monday, August 15, 2011 12:10 PM
> >>> To: Myers, Jim
> >>> Cc: Satya Sahoo; Deus, Helena; Khalid Belhajjame;
> public-prov-wg@w3.org
> >>> Subject: Re: playing with pil ontology
> >>>
> >>> Jim,
> >>>
> >>> FWIW, in PAQ we talk about "provenance information" as just another
> resource
> >>> that includes provenance assertions.  To my mind, it's primary
> representation
> >>> would be as an RDF document.
> >>>
> >>> The terminology here is subject to review and harmonization with the
> model,
> >>> but I'm not convinced that we need a new concept in the model for this,
> and
> >>> I'm not keen on a name involving "container", as in my mind that sets
> up
> >>> expectations of a distinct layer of encapsulation.  We don't talk about
> >>> "containers" for HTML or XML elements, we just talk about HTML and XML
> >>> documents.  Same for provenance, IMO.
> >>>
> >>> I suppose that suggests "Provenance Document", or similar.
> >>>
> >>> #g
> >>> --
> >>>
> >>> Myers, Jim wrote:
> >>>
> >>>>
> >>>> A couple quick comments: I don't think we've distinguished provenance
> >>>> container and account at this point - they are an entity which
> >>>> contains provenance statements and are used to enable you to talk
> >>>> about how the provenance was created (what processes and inputs caused
> >>>> those statements to be), but collection has been discussed as a
> >>>> general aggregate entity/container - a bag of marbles is an entity and
> >>>> saying a process execution used it is shorthand for talking about the
> >>>> individual marbles. A file is a collection of bytes and a process
> >>>> execution may only use some of the bytes, etc.
> >>>>
> >>>>
> >>>>
> >>>> Re: roles - I would argue that you should use something quite specific
> >>>> for the role of your temperature parameter, e.g.
> >>>> "processingtempraturesetpoint' rather than a generic "input" or
> >>>> "inputParameter" role (parameter might still be a supertype of
> >>>> processingtemperaturesetpoint). This would be necessary if, for
> >>>> example, your process execution had a reaction temperature and a
> >>>> storage temperature as inputs - now you have two numbers/two
> >>>> temperatures and you have to use each in the correct role for the
> >>>> provenance to be correct. In many cases, you could potentially
> >>>> describe the type of the entity itself well enough to make the
> >>>> provenance clear, but putting the information into the entity typing
> >>>> rather than into the role it has relative to the process execution
> >>>> causes trouble if you use the entity in multiple processes (if I make
> >>>> an entity that is of type "processingtemperaturesetpoint" and I have a
> >>>> second process that displays a "printablenumber" that uses it as
> >>>> input, the same entity can't also be of type "printable number" -
> >>>> better to make the entity have type number and play a
> >>>> 'processingtemperaturesetpoint" role in one process and the
> >>>> "printablenumber" role in the other.)
> >>>>
> >>>>
> >>>>
> >>>> Jim
> >>>>
> >>>>
> >>>>
> >>>> *From:* public-prov-wg-request@w3.org
> >>>> [mailto:public-prov-wg-request@w3.org] *On Behalf Of *Satya Sahoo
> >>>> *Sent:* Monday, August 15, 2011 11:02 AM
> >>>> *To:* Deus, Helena
> >>>> *Cc:* Khalid Belhajjame; public-prov-wg@w3.org
> >>>> *Subject:* Re: playing with pil ontology
> >>>>
> >>>>
> >>>>
> >>>> Hi Lena,
> >>>>
> >>>> Thanks again for trying to use the ontology for the microarray use
> case!
> >>>>
> >>>>
> >>>>
> >>>> My comments are inline:
> >>>>
> >>>>
> >>>>
> >>>>   >I am not questioning whether agent should be mapped to agents
> >>>> defined elsewhere, which seems to>be obvious- only wondering whether
> >>>> agent "label" and "description" are things we want to standardize>in
> >>>> our model or not. We can "suggest" rdfs:label and rdfs:comment without
> >>>> enforcing it as such ->having those included in the model will likely
> >>>> result in much less heterogeneity when it comes to>reporting
> >>>> provenance (particularly since we are defining it necessarily "open"
> >>>> and highly granular to fit>any particular domain.
> >>>>
> >>>>
> >>>>
> >>>> I am not sure I understand your point. The rdfs:label and rdfs:comment
> >>>> are two of the nine annotation properties that are part of the OWL2
> >>>> syntax. So, the provenance ontology encoded in OWL includes them by
> >>>> default.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>   >  What was its intended purpose/role in the description of
> provenance?
> >>>>
> >>>>
> >>>>
> >>>> Provenance container, account, and collection are related concepts for
> >>>> modeling a collection of provenance assertions. E.g. provenance of a
> >>>> Affymetrix gene chip will be a collection of provenance assertions
> >>>> (date of manufacture, location of manufacturer, production series
> >>>> etc.) that can be stored in a single file and the file will be a
> >>>> provenance container.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>> Example: a list of height measurement is an "untransformed" entity (a
> >>>>>
> >>>> dataset); the average of that list>is the "transformed" entity
> >>>> (another dataset, although a very simple one).
> >>>>
> >>>>
> >>>>> I am dealing with much more complex workflows, (e.g. files containing
> >>>>>
> >>>> the outcome of a microarray>experiment as the untransformed dataset
> >>>> and a list of differentially expressed genes as the>transformed
> >>>> dataset), so please take the example above is just illustrative.
> >>>>
> >>>>
> >>>>
> >>>> I am not sure I see the granularity/expressivity issue in the above
> >>>> example (from your first mail). Both the "untransformed" and
> >>>> "transformed" entities map to input and output data of a process
> >>>> execution - we can create subclass of Entity for this purpose.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>> An investigator (agent) performs an experiment That experiment has
> >>>>>
> >>>> several input parameters, some>of which are entities (e.g. samples),
> >>>> other are not (e.g. temperature) Resulting from the experiment are
> >>>>
> >>>>> several output parameters (entities)
> >>>>>
> >>>>
> >>>>
> >>>> I am confused by the above scenario. Why is temperature not an entity?
> >>>> Both the input (sample) and (temperature) are special types (sub
> >>>> class) of entities - (a) InputData and (b) InputParameter etc.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>> So if I understand what you are saying correctly, "temperature" would
> >>>>>
> >>>> be an entity of type "input",>which in turn would be subclass of
> >>>> "role". An instance of "input" could then have a certain value (e.g.
> >>>>   >15C) in one of its properties?
> >>>>
> >>>>
> >>>>> In that case, does it make sense to include "input" and "output"
> >>>>> classes
> >>>>>
> >>>> in the model as subclasses of>"role"? Or is this something that me
> >>>> and Stephan exemplify in the primer document under "usage of>agent"
> >>>> (or something of the sort)?
> >>>>
> >>>>
> >>>>
> >>>> I agree with Khalid's example where Role allows us to model more
> >>>> complex scenarios. For example, X is an instance of class HumanBeing
> >>>> (perhaps as subclass of entity) and X has multiple roles - researcher,
> >>>> parent, soccer player etc. To model these "functions" we will use the
> >>>> Role class. I believe in the microarray scenario (in your first mail)
> >>>> Roles are not needed.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>> In that case, does it make sense to include "input" and "output"
> >>>>>
> >>>> classes in the model as>subclasses of "role"? Or is this something
> >>>> that me and Stephan exemplify in the primer>document under "usage of
> >>>>
> >>> agent"
> >>>
> >>>> (or something of the sort)?
> >>>>
> >>>>
> >>>>
> >>>> Sorry I did not understand this. Role can be used by any entity, why
> >>>> only "usage of agent"?
> >>>>
> >>>>
> >>>>
> >>>> Thanks.
> >>>>
> >>>>
> >>>>
> >>>> Best,
> >>>>
> >>>> Satya
> >>>>
> >>>>
> >>>>
> >>>> On Mon, Aug 15, 2011 at 7:01 AM, Deus, Helena<helena.deus@deri.org
> >>>> <mailto:helena.deus@deri.org>>  wrote:
> >>>>
> >>>> Hi Khalid,
> >>>>
> >>>> Please see comments inline
> >>>>
> >>>>
> >>>>
> >>>> *From:* Khalid Belhajjame [mailto:Khalid.Belhajjame@cs.man.ac.uk
> >>>> <mailto:Khalid.Belhajjame@cs.man.ac.uk>]
> >>>> *Sent:* 12 August 2011 10:22
> >>>> *To:* Deus, Helena
> >>>> *Cc:* public-prov-wg@w3.org<mailto:public-prov-wg@w3.org>
> >>>> *Subject:* Re: playing with pil ontology
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> Hi Helena,
> >>>>
> >>>> Thanks for this, I think that this is a good exercise and some of the
> >>>> point you mentioned relate to the conceptual model, not only the
> >>>> formal model.
> >>>>
> >>>> On 11/08/2011 18:52, Deus, Helena wrote:
> >>>>
> >>>> Hi all,
> >>>>
> >>>> Reiterating a bit on what was addressed today  in the telco, I
> >>>> downloaded the ontology from mercurial and tried to use it with my use
> >>>> case.
> >>>>
> >>>> I am using the use cases published in [1] and demoed with SPARQL at
> >>>> http://biordfmicroarray.googlecode.com/hg/sparql_endpoint.html
> >>>>
> >>>>
> >>>>
> >>>> Here is my input so far:
> >>>>
> >>>>
> >>>>
> >>>> Agent could have dataProperty "label" and "description"; it would help
> >>>> the implementer describe what type of agent does he/she intend to
> >>>> describe. Is the ontology here being confused with the query model?
> >>>>
> >>>> I think that there was previously a long thread discussion on agent
> >>>> and agent types, and whether the model should be prescriptive in this
> >>>> respect. One of the solutions that I think many people were happy with
> >>>> is to leave users choose their favorite model(ontology) for agent,
> >>>> which means that the agent class defined in the ontology acts as a
> >>>> place holder that can be specialized to include description, types,
> >>>> and whatever the application needs.
> >>>>
> >>>>
> >>>>
> >>>> I am not questioning whether agent should be mapped to agents defined
> >>>> elsewhere, which seems to be obvious- only wondering whether agent
> >>>> "label" and "description" are things we want to standardize in our
> >>>> model or not. We can "suggest" rdfs:label and rdfs:comment without
> >>>> enforcing it as such - having those included in the model will likely
> >>>> result in much less heterogeneity when it comes to reporting
> >>>> provenance (particularly since we are defining it necessarily "open"
> >>>> and highly granular to fit any particular domain.
> >>>>
> >>>>
> >>>>
> >>>> ProvenanceContainer is not useful, or its description is not clear;
> >>>> what should be an instance of provenanceContainer?
> >>>>
> >>>>
> >>>> At this stage, the description of this concept is not yet stable in
> >>>> the conceptual model as far as I know.
> >>>>
> >>>>
> >>>>
> >>>> What was its intended purpose/role in the description of provenance?
> >>>>
> >>>>
> >>>>
> >>>> I want to create an instance of a "untransformed" entity (in my case,
> >>>> a
> >>>> dataset) and a "transformed" entity. Is the model going to give me
> >>>> that granularity/expressivity or do we expect each implementer to come
> >>>> up with their own way of defining these?
> >>>>
> >>>> Could you please clarify what you mean by transformed and
> >>>> untransformed entity?
> >>>>
> >>>> Example: a list of height measurement is an "untransformed" entity (a
> >>>> dataset); the average of that list is the "transformed" entity
> >>>> (another dataset, although a very simple one).
> >>>>
> >>>>
> >>>>
> >>>> I am dealing with much more complex workflows, (e.g. files containing
> >>>> the outcome of a microarray experiment as the untransformed dataset
> >>>> and a list of differentially expressed genes as the transformed
> >>>> dataset), so please take the example above is just illustrative.
> >>>>
> >>>>
> >>>>
> >>>> ProcessExecution needs more expressivity, I think. Not sure how to
> >>>> solve this in a domain independent way, but here's my problem:
> >>>>
> >>>> An investigator (agent) performs an experiment
> >>>>
> >>>> That experiment has several input parameters, some of which are
> >>>> entities (e.g. samples), other are not (e.g. temperature).
> >>>>
> >>>> Resulting from the experiment are several output parameters (entities)
> >>>>
> >>>>
> >>>> I think that the current model caters for the above need. If you are
> >>>> specifically trying to differentiate between different kinds of inputs
> >>>> (samples as opposed to temperature), then the notion of role can be
> >>>> helpful in this resepect.
> >>>>
> >>>>
> >>>>
> >>>> So if I understand what you are saying correctly, "temperature" would
> >>>> be an entity of type "input", which in turn would be subclass of
> >>>> "role". An instance of "input" could then have a certain value (e.g.
> >>>> 15C) in one of its properties?
> >>>>
> >>>> In that case, does it make sense to include "input" and "output"
> >>>> classes in the model as subclasses of "role"? Or is this something
> >>>> that me and Stephan exemplify in the primer document under "usage of
> >>>> agent" (or something of the sort)?
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> Thanks, khalid
> >>>>
> >>>>
> >>>>
> >>>> Have not completed my "experiment" yet, but will provide more feedback
> >>>> soon J
> >>>>
> >>>>
> >>>>
> >>>> Best Regards,
> >>>>
> >>>> Helena F. Deus
> >>>>
> >>>> Post-doctoral Researcher
> >>>> Digital Enterprise Research Institute
> >>>>
> >>>> National University of Ireland, Galway
> >>>>
> >>>> http://lenadeus.info
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>
> >>
> >
> >
> > ______________________________________________________________________
> > This email has been scanned by the MessageLabs Email Security System.
> > For more information please visit http://www.messagelabs.com/email
> > ______________________________________________________________________
> >
>
>
>
> --
> Dr Simon Miles
> Lecturer, Department of Informatics
> Kings College London, WC2R 2LS, UK
> +44 (0)20 7848 1166
>
>

Received on Wednesday, 24 August 2011 16:22:41 UTC