Re: Workflow Example in Formal Model HTML document from James Cheney on 2011-10-01 (public-prov-wg@w3.org from October 2011)

From: James Cheney <jcheney@inf.ed.ac.uk>
Date: Sat, 1 Oct 2011 18:47:23 +0100
To: Satya Sahoo <satya.sahoo@case.edu>
Cc: Provenance Working Group WG <public-prov-wg@w3.org>
Message-Id: <656956B8-3B4A-4CA5-AE8B-93965323BBC7@inf.ed.ac.uk>

('binary' encoding is not supported, stored as-is)

On Oct 1, 2011, at 1:11 AM, Satya Sahoo wrote:

> Hi James,
> I am afraid we are mixing up multiple issues here.

Probably!

> 
> >entities are intended to denote "things in the world", while provenance containers (and accounts) are syntactic things that >don't necessarily denote real-world things 
> 
> Every resource/thing in any computer-based information system is a representative/placeholder of some thing in a "domain of discourse".

Not sure I agree...  what real-world thing does this represent:  {{{{{{{{{{}}}}}}}}

> I am not sure what metric you are using to define a "thing in the world". 
> 
> To understand your point better, can you please identify "thing in the world" among the following examples:
> 1. "particular execution of a scientific workflow"
> 
> 2. "a text file on a computer hard drive" (not its print out on paper, that is a distinct thing)
> 
> 3. "a signaling pathway in mouse brain" (signaling pathway is a special type of biological pathway, which is defined as "A biological pathway is a series of actions among molecules in a cell that leads to a certain product or a change in a cell." from www.genome.gov - in other words there is no concrete structure that we can touch and say it is a biological pathway)
> 
> 4. "idea, which is patented, to increase efficiency of petrol-based internal combustion engine" 
> 
> In provenance applications, we make provenance assertions about each of the above four things. Further, almost all of them are used to organize data (information) as an abstraction mechanism. 
> 

I am not proposing any metric for identifying "things in the world".  In the strawman, there is a set Things that contains all the time-varying "things in the world" that we might talk about.  This is a  subjective/application-defined parameter: to use the semantics, you have to choose what things you are interested in.

Each of the above examples could be viewed as things in the world, or not.  I am fine with Entities representing "containers"  and possibly containing other Entities that happen to carry provenance information.  I think this would be interesting to model.  I just think this is not what was meant by ProvenanceContainer in the data model, and I don't understand why we'd want a specific notion of ProvenanceContainer as a subclass of Entity as opposed to a generic notion of Container as a subclass of Entity.

> >By analogy, in mathematics we can talk about "the set of all people whose first initial is Z" without there necessarily being >a real-world thing corresponding to this set.
> 
> In Web application (the scope for our WG work) - we are dealing with representation of "people" and not real persons. Hence, both terms "people" and "set of people with initial Z" are syntactic constructs we define to work with and then define rules for their interpretation that associates semantics with the constructs. 

Agreed.  I think we disagree as to what that semantics is.  Sorry if I'm stating the obvious here.

> 
> My understanding of both Entity and Provenance Container are in line with the above points, that is, both are syntactic constructs. Both can be used represent things in our domain of discourse.  
> 

I agree that both *could* be used to represent "things in the world" - whatever they may be.  I don't agree that they *necessarily* must be - they could be interpreted as mathematical objects that aren't physically realized as whatever "things" we are interested in.  Moreover, there are other possible things in the domain of discourse besides "things in the world".

Maybe a different example could help.  We are also considering concepts like time and location. But a time is not a "thing in the world" in the strawman - it's just a number.  Just because Time is (?) a concept, doesn't mean we have to interpret it as a "thing in the world" - we interpret it as a number or string (or XML schema time, or whatever).  A "thing in the world" has a lot more structure - it varies over time.  It would make no sense to interpret a Time as a "thing in the world" - a time isn't a physical thing that varies over time !

Likewise, I read the data model as interpreting a ProvenanceContainer as an abstract (named) collection of provenance statements.  It's not clear to me that this needs to be mapped to a first-class OWL concept, rather perhaps this sense of ProvenanceContainer maps more directly to a "named graph".  But if it does map to a concept, then we can interpret individuals of that concept as sets of provenance assertions, rather than as "things in the world" that contain things that could be interpreted as provenance information.

Does this distinction make sense?  My choice of "thing in the world" terminology might be confusing here... there are mathematical things (like numbers, times, sets) that I don't consider "things in the world".

--James

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Received on Saturday, 1 October 2011 17:48:02 UTC