Re: playing with pil ontology from Daniel Garijo on 2011-08-15 (public-prov-wg@w3.org from August 2011)

From: Daniel Garijo <dgarijo@delicias.dia.fi.upm.es>
Date: Mon, 15 Aug 2011 15:39:00 -0700
To: Graham Klyne <GK@ninebynine.org>
Cc: "Myers, Jim" <MYERSJ4@rpi.edu>, Satya Sahoo <satya.sahoo@case.edu>, "Deus, Helena" <helena.deus@deri.org>, Khalid Belhajjame <Khalid.Belhajjame@cs.man.ac.uk>, "public-prov-wg@w3.org" <public-prov-wg@w3.org>
Message-ID: <CAExK0DemoxLk8ehyhM6kHAtkynf0CCjzxt8F90-mzBPnB8ujmw@mail.gmail.com>
Yes, I was thinking about named graphs for grouping the provenance
descriptions. However, I do think
that the model should recognize explicitly the "provenance container" (or
whatever we decide to name it in
the end), so I could select the provenance containers having statements
referring to a resource and filter them
depending on a certain constraint (like author or date of creation).

Best,
Daniel

2011/8/15 Graham Klyne <GK@ninebynine.org>

> Daniel,
>
> It sounds to me as if you're trying to subdivide web resources, and that
> seems to me like a potential lot of complexity for questionable gain.
>
> (If you're thinking of something like named graphs in an RDF document, then
> fine:  here each of the graphs has its own URI, so for descriptive purposes
> can be treated as a separate web resource.  I don't think this is something
> the model needs to explicitly recognize, as it amounts to an implementation
> detail.)
>
> #g
> --
>
> Daniel Garijo wrote:
>
>> Hi Graham,
>> I like Provenance Container. What if your provenance statements were
>> created by different persons,
>> processes or at different times, but they are within the same Provenance
>> Document
>> (since they are provenance assertions about the same entity)? I may want
>> to describe the different
>> provenance containers, or even the provenance container descriptions with
>> another one.
>>
>> Thanks,
>> Daniel
>>
>> 2011/8/15 Graham Klyne <GK@ninebynine.org <mailto:GK@ninebynine.org>>
>>
>>
>>    Jim,
>>
>>    FWIW, in PAQ we talk about "provenance information" as just another
>>    resource that includes provenance assertions.  To my mind, it's
>>    primary representation would be as an RDF document.
>>
>>    The terminology here is subject to review and harmonization with the
>>    model, but I'm not convinced that we need a new concept in the model
>>    for this, and I'm not keen on a name involving "container", as in my
>>    mind that sets up expectations of a distinct layer of encapsulation.
>>     We don't talk about "containers" for HTML or XML elements, we just
>>    talk about HTML and XML documents.  Same for provenance, IMO.
>>
>>    I suppose that suggests "Provenance Document", or similar.
>>
>>    #g
>>    --
>>
>>    Myers, Jim wrote:
>>
>>
>>
>>        A couple quick comments: I don’t think we’ve distinguished
>>        provenance container and account at this point – they are an
>>        entity which contains provenance statements and are used to
>>        enable you to talk about how the provenance was created (what
>>        processes and inputs caused those statements to be), but
>>        collection has been discussed as a general aggregate
>>        entity/container – a bag of marbles is an entity and saying a
>>        process execution used it is shorthand for talking about the
>>        individual marbles. A file is a collection of bytes and a
>>        process execution may only use some of the bytes, etc.
>>
>>                 Re: roles – I would argue that you should use something
>> quite
>>        specific for the role of your temperature parameter, e.g.
>>        “processingtempraturesetpoint’ rather than a generic “input” or
>>        “inputParameter” role (parameter might still be a supertype of
>>        processingtemperaturesetpoint) . This would be necessary if, for
>>        example, your process execution had a reaction temperature and a
>>        storage temperature as inputs – now you have two numbers/two
>>        temperatures and you have to use each in the correct role for
>>        the provenance to be correct. In many cases, you could
>>        potentially describe the type of the entity itself well enough
>>        to make the provenance clear, but putting the information into
>>        the entity typing rather than into the role it has relative to
>>        the process execution causes trouble if you use the entity in
>>        multiple processes (if I make an entity that is of type “
>>        processingtemperaturesetpoint” and I have a second process that
>>        displays a “printablenumber” that uses it as input, the same
>>        entity can’t also be of type “printable number” – better to make
>>        the entity have type number and play a ‘
>>        processingtemperaturesetpoint” role in one process and the
>>        “printablenumber” role in the other.)
>>
>>                 Jim
>>
>>                 *From:* public-prov-wg-request@w3.org
>>        <mailto:public-prov-wg-**request@w3.org<public-prov-wg-request@w3.org>>
>> [mailto:public-prov-wg-
>>        request@w3.org <mailto:public-prov-wg-**request@w3.org<public-prov-wg-request@w3.org>>]
>> *On
>>
>>        Behalf Of *Satya Sahoo
>>        *Sent:* Monday, August 15, 2011 11:02 AM
>>        *To:* Deus, Helena
>>        *Cc:* Khalid Belhajjame; public-prov-wg@w3.org
>>        <mailto:public-prov-wg@w3.org>
>>        *Subject:* Re: playing with pil ontology
>>
>>                 Hi Lena,
>>
>>        Thanks again for trying to use the ontology for the microarray
>>        use case!
>>                 My comments are inline:
>>
>>                  >I am not questioning whether agent should be mapped to
>> agents
>>        defined elsewhere, which seems to >be obvious– only wondering
>>        whether agent “label” and “description” are things we want to
>>        standardize >in our model or not. We can “suggest” rdfs:label
>>        and rdfs:comment without enforcing it as such – >having those
>>        included in the model will likely result in much less
>>        heterogeneity when it comes to >reporting provenance
>>        (particularly since we are defining it necessarily “open” and
>>        highly granular to fit >any particular domain.
>>
>>                 I am not sure I understand your point. The rdfs:label and
>>        rdfs:comment are two of the nine annotation properties that are
>>        part of the OWL2 syntax. So, the provenance ontology encoded in
>>        OWL includes them by default.
>>
>>
>>         > What was its intended purpose/role in the description of
>>        provenance?
>>
>>                 Provenance container, account, and collection are related
>>        concepts for modeling a collection of provenance assertions.
>>        E.g. provenance of a Affymetrix gene chip will be a collection
>>        of provenance assertions (date of manufacture, location of
>>        manufacturer, production series etc.) that can be stored in a
>>        single file and the file will be a provenance container.
>>
>>            Example: a list of height measurement is an “untransformed”
>>            entity (a
>>
>>        dataset); the average of that list >is the “transformed” entity
>>        (another dataset, although a very simple one).
>>
>>            I am dealing with much more complex workflows, (e.g. files
>>            containing
>>
>>        the outcome of a microarray >experiment as the untransformed
>>        dataset and a list of differentially expressed genes as the
>>         >transformed dataset), so please take the example above is just
>>        illustrative.
>>
>>                 I am not sure I see the granularity/expressivity issue in
>> the
>>        above example (from your first mail). Both the "untransformed"
>>        and "transformed" entities map to input and output data of a
>>        process execution - we can create subclass of Entity for this
>>        purpose.
>>
>>
>>            An investigator (agent) performs an experiment That
>>            experiment has
>>
>>        several input parameters, some >of which are entities (e.g.
>>        samples), other are not (e.g. temperature) Resulting from the
>>        experiment are  >several output parameters (entities)
>>
>>                 I am confused by the above scenario. Why is temperature
>> not an
>>        entity? Both the input (sample) and (temperature) are special
>>        types (sub class) of entities - (a) InputData and (b)
>>        InputParameter etc.
>>
>>
>>            So if I understand what you are saying correctly,
>>            “temperature” would
>>
>>        be an entity of type “input”, >which in turn would be subclass
>>        of “role”. An instance of “input” could then have a certain
>>        value (e.g.  >15C) in one of its properties?
>>
>>            In that case, does it make sense to include “input” and
>>            “output” classes
>>
>>        in the model as subclasses of >“role”? Or is this something that
>>        me and Stephan exemplify in the primer document under “usage of
>>         >agent” (or something of the sort)?
>>
>>                 I agree with Khalid's example where Role allows us to
>> model more
>>        complex scenarios. For example, X is an instance of class
>>        HumanBeing (perhaps as subclass of entity) and X has multiple
>>        roles - researcher, parent, soccer player etc. To model these
>>        "functions" we will use the Role class. I believe in the
>>        microarray scenario (in your first mail) Roles are not needed.
>>
>>
>>            In that case, does it make sense to include “input” and
>>            “output”
>>
>>        classes in the model as >subclasses of “role”? Or is this
>>        something that me and Stephan exemplify in the primer >document
>>        under “usage of agent” (or something of the sort)?
>>
>>                 Sorry I did not understand this. Role can be used by any
>> entity,
>>        why only "usage of agent"?
>>
>>                 Thanks.
>>
>>                 Best,
>>
>>        Satya
>>
>>                 On Mon, Aug 15, 2011 at 7:01 AM, Deus, Helena
>>        <helena.deus@deri.org <mailto:helena.deus@deri.org>
>>        <mailto:helena.deus@deri.org <mailto:helena.deus@deri.org>>**>
>> wrote:
>>
>>        Hi Khalid,
>>
>>        Please see comments inline
>>
>>                 *From:* Khalid Belhajjame [mailto:Khalid.Belhajjame@cs.
>>        man.ac.uk <mailto:Khalid.Belhajjame@cs.**man.ac.uk<Khalid.Belhajjame@cs.man.ac.uk>
>> >
>>        <mailto:Khalid.Belhajjame@cs. man.ac.uk
>>        <mailto:Khalid.Belhajjame@cs.**man.ac.uk<Khalid.Belhajjame@cs.man.ac.uk>
>> >>]
>>
>>        *Sent:* 12 August 2011 10:22
>>        *To:* Deus, Helena
>>        *Cc:* public-prov-wg@w3.org <mailto:public-prov-wg@w3.org>
>>        <mailto:public-prov-wg@w3.org <mailto:public-prov-wg@w3.org>**>
>>
>>
>>        *Subject:* Re: playing with pil ontology
>>
>>
>>        Hi Helena,
>>
>>        Thanks for this, I think that this is a good exercise and some
>>        of the point you mentioned relate to the conceptual model, not
>>        only the formal model.
>>
>>        On 11/08/2011 18:52, Deus, Helena wrote:
>>
>>        Hi all,
>>
>>        Reiterating a bit on what was addressed today  in the telco, I
>>        downloaded the ontology from mercurial and tried to use it with
>>        my use case.
>>
>>        I am using the use cases published in [1] and demoed with SPARQL
>>        at http://biordfmicroarray. googlecode.com/hg/sparql_
>>        endpoint.html
>>        <http://biordfmicroarray.**googlecode.com/hg/sparql_**
>> endpoint.html<http://biordfmicroarray.googlecode.com/hg/sparql_endpoint.html>
>> >
>>
>>                 Here is my input so far:
>>
>>                 Agent could have dataProperty “label” and “description”;
>> it
>>        would help the implementer describe what type of agent does
>>        he/she intend to describe. Is the ontology here being confused
>>        with the query model?
>>
>>        I think that there was previously a long thread discussion on
>>        agent and agent types, and whether the model should be
>>        prescriptive in this respect. One of the solutions that I think
>>        many people were happy with is to leave users choose their
>>        favorite model(ontology) for agent, which means that the agent
>>        class defined in the ontology acts as a place holder that can be
>>        specialized to include description, types, and whatever the
>>        application needs.
>>
>>                 I am not questioning whether agent should be mapped to
>> agents
>>        defined elsewhere, which seems to be obvious– only wondering
>>        whether agent “label” and “description” are things we want to
>>        standardize in our model or not. We can “suggest” rdfs:label and
>>        rdfs:comment without enforcing it as such – having those
>>        included in the model will likely result in much less
>>        heterogeneity when it comes to reporting provenance
>>        (particularly since we are defining it necessarily “open” and
>>        highly granular to fit any particular domain.
>>
>>                 ProvenanceContainer is not useful, or its description is
>> not
>>        clear; what should be an instance of provenanceContainer?
>>
>>
>>        At this stage, the description of this concept is not yet stable
>>        in the conceptual model as far as I know.
>>
>>                 What was its intended purpose/role in the description of
>> provenance?
>>
>>                 I want to create an instance of a “untransformed” entity
>> (in my
>>        case, a dataset) and a “transformed” entity. Is the model going
>>        to give me that granularity/expressivity or do we expect each
>>        implementer to come up with their own way of defining these?
>>
>>        Could you please clarify what you mean by transformed and
>>        untransformed entity?
>>
>>        Example: a list of height measurement is an “untransformed”
>>        entity (a dataset); the average of that list is the
>>        “transformed” entity (another dataset, although a very simple one).
>>
>>                 I am dealing with much more complex workflows, (e.g. files
>>        containing the outcome of a microarray experiment as the
>>        untransformed dataset and a list of differentially expressed
>>        genes as the transformed dataset), so please take the example
>>        above is just illustrative.
>>
>>                 ProcessExecution needs more expressivity, I think. Not
>> sure how
>>        to solve this in a domain independent way, but here’s my problem:
>>
>>        An investigator (agent) performs an experiment
>>
>>        That experiment has several input parameters, some of which are
>>        entities (e.g. samples), other are not (e.g. temperature).
>>
>>        Resulting from the experiment are several output parameters
>>        (entities)
>>
>>
>>        I think that the current model caters for the above need. If you
>>        are specifically trying to differentiate between different kinds
>>        of inputs (samples as opposed to temperature), then the notion
>>        of role can be helpful in this resepect.
>>
>>                 So if I understand what you are saying correctly,
>> “temperature”
>>        would be an entity of type “input”, which in turn would be
>>        subclass of “role”. An instance of “input” could then have a
>>        certain value (e.g. 15C) in one of its properties?
>>
>>        In that case, does it make sense to include “input” and “output”
>>        classes in the model as subclasses of “role”? Or is this
>>        something that me and Stephan exemplify in the primer document
>>        under “usage of agent” (or something of the sort)?
>>
>>
>>
>>        Thanks, khalid
>>
>>                 Have not completed my “experiment” yet, but will provide
>> more
>>        feedback soon J
>>
>>                 Best Regards,
>>
>>        Helena F. Deus
>>
>>        Post-doctoral Researcher
>>        Digital Enterprise Research Institute
>>
>>        National University of Ireland, Galway
>>
>>        http://lenadeus.info
>>
>>
>>
>>
>>
>>
>
>
Received on Monday, 15 August 2011 22:39:29 UTC