Re: playing with pil ontology

Daniel,

Why the need?  If named graphs are used, they have URIs and can be treated as 
independent resources.  Whether these are actually stored as separate RDF 
resources, or as named graphs within an RDF document is an implementation choice 
that should not be exposed through the abstract provenance model, IMO.

Not formally recognizing containers (or whatever) in the model does not prevent 
you from creating an application that does what you describe.

I think the crux of our debate is this:  is there an interoperability 
requirement that cannot be satisfied without explicitly recognizing containers 
in the model?  If there is a compelling such requirement, I'll withdraw my 
objection.

#g
--


Daniel Garijo wrote:
> Yes, I was thinking about named graphs for grouping the provenance 
> descriptions. However, I do think
> that the model should recognize explicitly the "provenance container" 
> (or whatever we decide to name it in
> the end), so I could select the provenance containers having statements 
> referring to a resource and filter them
> depending on a certain constraint (like author or date of creation).
> 
> Best,
> Daniel
> 
> 2011/8/15 Graham Klyne <GK@ninebynine.org <mailto:GK@ninebynine.org>>
> 
>     Daniel,
> 
>     It sounds to me as if you're trying to subdivide web resources, and
>     that seems to me like a potential lot of complexity for questionable
>     gain.
> 
>     (If you're thinking of something like named graphs in an RDF
>     document, then fine:  here each of the graphs has its own URI, so
>     for descriptive purposes can be treated as a separate web resource.
>      I don't think this is something the model needs to explicitly
>     recognize, as it amounts to an implementation detail.)
> 
>     #g
>     --
> 
>     Daniel Garijo wrote:
> 
>         Hi Graham,
>         I like Provenance Container. What if your provenance statements
>         were created by different persons,
>         processes or at different times, but they are within the same
>         Provenance Document
>         (since they are provenance assertions about the same entity)? I
>         may want to describe the different
>         provenance containers, or even the provenance container
>         descriptions with another one.
> 
>         Thanks,
>         Daniel
> 
>         2011/8/15 Graham Klyne <GK@ninebynine.org
>         <mailto:GK@ninebynine.org> <mailto:GK@ninebynine.org
>         <mailto:GK@ninebynine.org>>>
> 
> 
>            Jim,
> 
>            FWIW, in PAQ we talk about "provenance information" as just
>         another
>            resource that includes provenance assertions.  To my mind, it's
>            primary representation would be as an RDF document.
> 
>            The terminology here is subject to review and harmonization
>         with the
>            model, but I'm not convinced that we need a new concept in
>         the model
>            for this, and I'm not keen on a name involving "container",
>         as in my
>            mind that sets up expectations of a distinct layer of
>         encapsulation.
>             We don't talk about "containers" for HTML or XML elements,
>         we just
>            talk about HTML and XML documents.  Same for provenance, IMO.
> 
>            I suppose that suggests "Provenance Document", or similar.
> 
>            #g
>            --
> 
>            Myers, Jim wrote:
> 
> 
> 
>                A couple quick comments: I don’t think we’ve distinguished
>                provenance container and account at this point – they are an
>                entity which contains provenance statements and are used to
>                enable you to talk about how the provenance was created (what
>                processes and inputs caused those statements to be), but
>                collection has been discussed as a general aggregate
>                entity/container – a bag of marbles is an entity and saying a
>                process execution used it is shorthand for talking about the
>                individual marbles. A file is a collection of bytes and a
>                process execution may only use some of the bytes, etc.
> 
>                         Re: roles – I would argue that you should use
>         something quite
>                specific for the role of your temperature parameter, e.g.
>                “processingtempraturesetpoint’ rather than a generic
>         “input” or
>                “inputParameter” role (parameter might still be a
>         supertype of
>                processingtemperaturesetpoint) . This would be necessary
>         if, for
>                example, your process execution had a reaction
>         temperature and a
>                storage temperature as inputs – now you have two numbers/two
>                temperatures and you have to use each in the correct role for
>                the provenance to be correct. In many cases, you could
>                potentially describe the type of the entity itself well
>         enough
>                to make the provenance clear, but putting the information
>         into
>                the entity typing rather than into the role it has
>         relative to
>                the process execution causes trouble if you use the entity in
>                multiple processes (if I make an entity that is of type “
>                processingtemperaturesetpoint” and I have a second
>         process that
>                displays a “printablenumber” that uses it as input, the same
>                entity can’t also be of type “printable number” – better
>         to make
>                the entity have type number and play a ‘
>                processingtemperaturesetpoint” role in one process and the
>                “printablenumber” role in the other.)
> 
>                         Jim
> 
>                         *From:* public-prov-wg-request@w3.org
>         <mailto:public-prov-wg-request@w3.org>
>                <mailto:public-prov-wg- request@w3.org
>         <mailto:public-prov-wg-request@w3.org>> [mailto:public-prov-wg-
>         <mailto:public-prov-wg->
>                request@w3.org <mailto:request@w3.org>
>         <mailto:public-prov-wg- request@w3.org
>         <mailto:public-prov-wg-request@w3.org>>] *On
> 
>                Behalf Of *Satya Sahoo
>                *Sent:* Monday, August 15, 2011 11:02 AM
>                *To:* Deus, Helena
>                *Cc:* Khalid Belhajjame; public-prov-wg@w3.org
>         <mailto:public-prov-wg@w3.org>
>                <mailto:public-prov-wg@w3.org <mailto:public-prov-wg@w3.org>>
>                *Subject:* Re: playing with pil ontology
> 
>                         Hi Lena,
> 
>                Thanks again for trying to use the ontology for the
>         microarray
>                use case!
>                         My comments are inline:
> 
>                          >I am not questioning whether agent should be
>         mapped to agents
>                defined elsewhere, which seems to >be obvious– only wondering
>                whether agent “label” and “description” are things we want to
>                standardize >in our model or not. We can “suggest” rdfs:label
>                and rdfs:comment without enforcing it as such – >having those
>                included in the model will likely result in much less
>                heterogeneity when it comes to >reporting provenance
>                (particularly since we are defining it necessarily “open” and
>                highly granular to fit >any particular domain.
> 
>                         I am not sure I understand your point. The
>         rdfs:label and
>                rdfs:comment are two of the nine annotation properties
>         that are
>                part of the OWL2 syntax. So, the provenance ontology
>         encoded in
>                OWL includes them by default.
> 
>                          
>                 > What was its intended purpose/role in the description of
>                provenance?
> 
>                         Provenance container, account, and collection
>         are related
>                concepts for modeling a collection of provenance assertions.
>                E.g. provenance of a Affymetrix gene chip will be a
>         collection
>                of provenance assertions (date of manufacture, location of
>                manufacturer, production series etc.) that can be stored in a
>                single file and the file will be a provenance container.
>                                    
>                    Example: a list of height measurement is an
>         “untransformed”
>                    entity (a
> 
>                dataset); the average of that list >is the “transformed”
>         entity
>                (another dataset, although a very simple one).
> 
>                    I am dealing with much more complex workflows, (e.g.
>         files
>                    containing
> 
>                the outcome of a microarray >experiment as the untransformed
>                dataset and a list of differentially expressed genes as the
>                 >transformed dataset), so please take the example above
>         is just
>                illustrative.
> 
>                         I am not sure I see the granularity/expressivity
>         issue in the
>                above example (from your first mail). Both the
>         "untransformed"
>                and "transformed" entities map to input and output data of a
>                process execution - we can create subclass of Entity for this
>                purpose.
> 
>                                  
>                    An investigator (agent) performs an experiment That
>                    experiment has
> 
>                several input parameters, some >of which are entities (e.g.
>                samples), other are not (e.g. temperature) Resulting from the
>                experiment are  >several output parameters (entities)
> 
>                         I am confused by the above scenario. Why is
>         temperature not an
>                entity? Both the input (sample) and (temperature) are special
>                types (sub class) of entities - (a) InputData and (b)
>                InputParameter etc.
> 
>                          
>                    So if I understand what you are saying correctly,
>                    “temperature” would
> 
>                be an entity of type “input”, >which in turn would be
>         subclass
>                of “role”. An instance of “input” could then have a certain
>                value (e.g.  >15C) in one of its properties?
> 
>                    In that case, does it make sense to include “input” and
>                    “output” classes
> 
>                in the model as subclasses of >“role”? Or is this
>         something that
>                me and Stephan exemplify in the primer document under
>         “usage of
>                 >agent” (or something of the sort)?
> 
>                         I agree with Khalid's example where Role allows
>         us to model more
>                complex scenarios. For example, X is an instance of class
>                HumanBeing (perhaps as subclass of entity) and X has multiple
>                roles - researcher, parent, soccer player etc. To model these
>                "functions" we will use the Role class. I believe in the
>                microarray scenario (in your first mail) Roles are not
>         needed.
> 
>                          
>                    In that case, does it make sense to include “input” and
>                    “output”
> 
>                classes in the model as >subclasses of “role”? Or is this
>                something that me and Stephan exemplify in the primer
>          >document
>                under “usage of agent” (or something of the sort)?
> 
>                         Sorry I did not understand this. Role can be
>         used by any entity,
>                why only "usage of agent"?
> 
>                         Thanks.
> 
>                         Best,
> 
>                Satya
> 
>                         On Mon, Aug 15, 2011 at 7:01 AM, Deus, Helena
>                <helena.deus@deri.org <mailto:helena.deus@deri.org>
>         <mailto:helena.deus@deri.org <mailto:helena.deus@deri.org>>
>                <mailto:helena.deus@deri.org
>         <mailto:helena.deus@deri.org> <mailto:helena.deus@deri.org
>         <mailto:helena.deus@deri.org>>> > wrote:
> 
>                Hi Khalid,
> 
>                Please see comments inline
> 
>                         *From:* Khalid Belhajjame
>         [mailto:Khalid.Belhajjame@cs <mailto:Khalid.Belhajjame@cs>.
>                man.ac.uk <http://man.ac.uk>
>         <mailto:Khalid.Belhajjame@cs. man.ac.uk
>         <mailto:Khalid.Belhajjame@cs.man.ac.uk>>
>                <mailto:Khalid.Belhajjame@cs
>         <mailto:Khalid.Belhajjame@cs>. man.ac.uk <http://man.ac.uk>
>                <mailto:Khalid.Belhajjame@cs. man.ac.uk
>         <mailto:Khalid.Belhajjame@cs.man.ac.uk>>>]
> 
>                *Sent:* 12 August 2011 10:22
>                *To:* Deus, Helena
>                *Cc:* public-prov-wg@w3.org
>         <mailto:public-prov-wg@w3.org> <mailto:public-prov-wg@w3.org
>         <mailto:public-prov-wg@w3.org>>
>                <mailto:public-prov-wg@w3.org
>         <mailto:public-prov-wg@w3.org> <mailto:public-prov-wg@w3.org
>         <mailto:public-prov-wg@w3.org>> >
> 
> 
>                *Subject:* Re: playing with pil ontology
> 
>                
>                Hi Helena,
> 
>                Thanks for this, I think that this is a good exercise and
>         some
>                of the point you mentioned relate to the conceptual
>         model, not
>                only the formal model.
> 
>                On 11/08/2011 18:52, Deus, Helena wrote:
> 
>                Hi all,
> 
>                Reiterating a bit on what was addressed today  in the
>         telco, I
>                downloaded the ontology from mercurial and tried to use
>         it with
>                my use case.
> 
>                I am using the use cases published in [1] and demoed with
>         SPARQL
>                at http://biordfmicroarray. googlecode.com/hg/sparql_
>         <http://googlecode.com/hg/sparql_>
>                endpoint.html
>                <http://biordfmicroarray. googlecode.com/hg/sparql_
>         endpoint.html
>         <http://biordfmicroarray.googlecode.com/hg/sparql_endpoint.html>>
> 
>                         Here is my input so far:
> 
>                         Agent could have dataProperty “label” and
>         “description”; it
>                would help the implementer describe what type of agent does
>                he/she intend to describe. Is the ontology here being
>         confused
>                with the query model?
> 
>                I think that there was previously a long thread discussion on
>                agent and agent types, and whether the model should be
>                prescriptive in this respect. One of the solutions that I
>         think
>                many people were happy with is to leave users choose their
>                favorite model(ontology) for agent, which means that the
>         agent
>                class defined in the ontology acts as a place holder that
>         can be
>                specialized to include description, types, and whatever the
>                application needs.
> 
>                         I am not questioning whether agent should be
>         mapped to agents
>                defined elsewhere, which seems to be obvious– only wondering
>                whether agent “label” and “description” are things we want to
>                standardize in our model or not. We can “suggest”
>         rdfs:label and
>                rdfs:comment without enforcing it as such – having those
>                included in the model will likely result in much less
>                heterogeneity when it comes to reporting provenance
>                (particularly since we are defining it necessarily “open” and
>                highly granular to fit any particular domain.
> 
>                         ProvenanceContainer is not useful, or its
>         description is not
>                clear; what should be an instance of provenanceContainer?
> 
> 
>                At this stage, the description of this concept is not yet
>         stable
>                in the conceptual model as far as I know.
> 
>                         What was its intended purpose/role in the
>         description of provenance?
> 
>                         I want to create an instance of a
>         “untransformed” entity (in my
>                case, a dataset) and a “transformed” entity. Is the model
>         going
>                to give me that granularity/expressivity or do we expect each
>                implementer to come up with their own way of defining these?
> 
>                Could you please clarify what you mean by transformed and
>                untransformed entity?
> 
>                Example: a list of height measurement is an “untransformed”
>                entity (a dataset); the average of that list is the
>                “transformed” entity (another dataset, although a very
>         simple one).
> 
>                         I am dealing with much more complex workflows,
>         (e.g. files
>                containing the outcome of a microarray experiment as the
>                untransformed dataset and a list of differentially expressed
>                genes as the transformed dataset), so please take the example
>                above is just illustrative.
> 
>                         ProcessExecution needs more expressivity, I
>         think. Not sure how
>                to solve this in a domain independent way, but here’s my
>         problem:
> 
>                An investigator (agent) performs an experiment
> 
>                That experiment has several input parameters, some of
>         which are
>                entities (e.g. samples), other are not (e.g. temperature).
> 
>                Resulting from the experiment are several output parameters
>                (entities)
> 
> 
>                I think that the current model caters for the above need.
>         If you
>                are specifically trying to differentiate between
>         different kinds
>                of inputs (samples as opposed to temperature), then the
>         notion
>                of role can be helpful in this resepect.
> 
>                         So if I understand what you are saying
>         correctly, “temperature”
>                would be an entity of type “input”, which in turn would be
>                subclass of “role”. An instance of “input” could then have a
>                certain value (e.g. 15C) in one of its properties?
> 
>                In that case, does it make sense to include “input” and
>         “output”
>                classes in the model as subclasses of “role”? Or is this
>                something that me and Stephan exemplify in the primer
>         document
>                under “usage of agent” (or something of the sort)?
> 
>                
> 
>                Thanks, khalid
> 
>                         Have not completed my “experiment” yet, but will
>         provide more
>                feedback soon J
> 
>                         Best Regards,
> 
>                Helena F. Deus
> 
>                Post-doctoral Researcher
>                Digital Enterprise Research Institute
> 
>                National University of Ireland, Galway
> 
>                http://lenadeus.info
> 
>                          
> 
> 
> 
> 
> 
> 

Received on Tuesday, 16 August 2011 07:22:05 UTC