Re: [Spam:***** SpamScore] Re: formal semantics strawman from James Cheney on 2011-09-09 (public-prov-wg@w3.org from September 2011)

From: James Cheney <jcheney@inf.ed.ac.uk>
Date: Fri, 9 Sep 2011 16:18:17 +0100
To: "Myers, Jim" <MYERSJ4@rpi.edu>
Cc: Graham Klyne <GK@ninebynine.org>, W3C provenance WG <public-prov-wg@w3.org>
Message-Id: <0C1148FE-7F1F-4E9F-9C76-BA1C207A76B2@inf.ed.ac.uk>
Hi,

There's been extensive discussion on other threads/issues concerning the nature of entities.  It seems to me that this issue remains unresolved.  I had hoped that the formal semantics strawman would contribute to this discussion, if we can try to formulate different views precisely.  I see that I did not respond to Jim Myers's comments below previously (I was out of town from the 28th until Wednesday) so am doing so now.

On Aug 27, 2011, at 5:43 PM, Myers, Jim wrote:

> Trying to catch up after travel. I'm not sure I have the big picture but a few comments about statements in the thread:
> 
> We need to avoid saying entities are fixed and ?things are not - entities are fixed in some ways - defined by attributes with values, but they are mutable in other ways. ?things can be this way as well and we may assert entities that have the same ID as an existing ?thing - because that thing is already defined as fixed in the ways we need it to be for provenance, or we may assert entities that are 'complements' of ?things when we need to fix more attributes or different attributes than for the ?thing itself.

I think one source of confusion - here and in the other discussions of entities - is whether an "entity (assertion)" is a syntactic, knowledge-modeling construct that merely represents (some subset of) knowledge asserted about a thing in the world, or whether it is something that exists in the world, alongside the ?things that are being described.

In the strawman, I initially chose to use the term "entity" for both the assertions and the things in the world, which was a bit ambiguous and inconsistent with the conceptual model draft; after an offline discussion with Luc I changed it so that ?thing is used for the things in the world (what I'd call the semantics) and "entity assertion" is used for the syntactic constructs that contain information about the ?things.

Your use of the term "entity" above hints that you consider them to be semantic components that exist alongside, but are subordinate to ?things - "sub-things" in some sense, rather than purely syntactic statements.

Can you confirm this or identify parts of the strawman that do not match what you have in mind?

I think that if we do not come to a common understanding of what components are "syntax" and what parts are "semantics" we will get hopelessly confused.  

> 
> The interval over which an entity exists may be different than the one for which a complementof relationship is true.

I believe this statement holds in the strawman.

> If we want an asserted entity that is is the fixed content at a live URL, it should have its own ID and a complementof relationship with the site URL(a ?thing and potentially another entity if we wish to discuss the provenance of the live site itself).

In the strawman, the ids denote ?things, but the ?thing denoted by an id can change over time.  Both Luc (offline) and Graham suggested that this may be unnecessarily general.  However, if entities are not first-class semantic things but just statements about them, then your suggestion that "it should have its own ID and a complementOf relationship with the site url" doesn't make sense.  

Do you have a suggestion about how the strawman could change to accommodate your view?  For example, would it be enough to adjust the lookup function to associate an entity-ID with a ?thing and a time interval?

> The complementOf relationship is true until the site content updates, but the fixed page entity could still exist. I think this is consistent with the discussion but am not sure.

This relates to an important issue which I noticed in doing the writeup: we have introduced a lot of statements that may be time dependent but don't have an explicit time parameter.

In the strawman, I suggested several possible ways of handling this for isComplementOf.  

1.  One can judge whether a complement relationship holds at a single time instant t by 
2.  One can judge whether it holds over an interval (this is what the current conceptual model suggests).
3.  One can treat complementOf as a time-invariant statement, which holds of two id's at all time points or at no time points.  This would make sense, for example, if we treat id's as mapping to ?things (i.e. the id represents the whole history of the Royal Society).  

> 
> The interpretation of an entity should be time invariant - one can choose to assert an entity that is a fixed web page or one for a a live website, but one should not have an entity asserted with a content property as part of its definition and then have that property change.

In the strawman, an assertion of the form entity(id,[attr=val,...]) is either true or false at a given time or across a given time interval.  I believe this is compatible with your suggestion that "an entity should be time invariant", but again, you seem to be talking about entities as semantic things, not assertions.

> For the example.com example, one can define an entity that is 'the content available from the example2.com site no matter what URL you get it from'' (retrieval URL can't be an attribute here) or one that is 'the content retrievable from example2.com' that is generated by the site starting operations and ceases to exist when the site URL is blocked from the world/retired (retrieval URL can be an attribute here). Either is valid - the point with an entity is that you are picking one definition and sticking with it, using complementof when you need to switch definitions.
> 

I don't fully understand this comment, in particular I'm not sure hat you mean by "one can define an entity that is..." - is this the same as "one can state an entity assertion describing...".

Is this an example of something that you believe can be done with the existing conceptual model that isn't handled correctly in the strawman semantics?  

--James

> Hope those are helpful  in the larger discussion (and consistent with others interpretation!)...
> 
> Jim
> ________________________________________
> From: public-prov-wg-request@w3.org [public-prov-wg-request@w3.org] on behalf of James Cheney [jcheney@inf.ed.ac.uk]
> Sent: Friday, August 26, 2011 7:17 AM
> To: Graham Klyne
> Cc: W3C provenance WG
> Subject: [Spam:***** SpamScore] Re: formal semantics strawman
> 
> On Aug 25, 2011, at 6:59 PM, Graham Klyne wrote:
> 
>> James,
>> 
>> Thanks.  This help to clarify for me some things that weren't clear to me in the model document.
>> 
> 
> Note that the strawman is not necessarily capturing the intent of the model document (it just represents my initial effort to interpret it formally) so might be misleading about what it was trying to say.  For example, Luc asked me offline to change what I was calling "entity" to something else because it doesn't match the model.
> 
> I've now updated the document to avoid this potential confusion between the PIDM assertion "entity" and the semantic "?things".  (The question mark is there to flag it as a term with special meaning; we should probably find a less generic term)
> 
>> You say at http://www.w3.org/2011/prov/wiki/FormalSemanticsStrawman#Interpreting_an_entity_assertion:
>> [[
>> Note that there is a design choice here: do we require that the entity associated with id be the same throughout the interval or not? I have chosen to require this, since otherwise the entity assertion doesn't seem to be about a "single entity across a time interval". Of course, if we require that the mapping from URIs to entities be time-invariant then this problem goes away.
>> ]]
>> 
>> As far as I can tell from a quick skim, everything else works as intended (at least in sections 1.3, 1.4) if the URI->Entity mapping is invariant.  Which I think leads to a model in which the distinction between resource and entity (which I find to be unhelpful) becomes less significant.
>> 
> 
> I was thinking of a situation where a URI is "retired" and redirected to a different target, e.g. example.com merges with example2.com, and http://www.example2.com is redirected to example.com's website from then on.  Perhaps in this example http://www.example2.com is by definition not a URI.
> 
> I think assuming that lookup is time-insensitive would be reasonable (and would definitely simplify some of the definitions), but wanted to highlight the design choice since it seems related to things that have been debated on the list.  I'd rather keep things more general now until it's clear that there's consensus about a consistent picture with the other related components.  If it is then obvious that time-variance in the interpretation of URIs is superfluous then it'll be easy to eliminate it.
> 
> --James
> 
> 
>> #g
>> --
>> 
>> On 25/08/2011 18:13, James Cheney wrote:
>>> Hi,
>>> 
>>> I've been promising for a while now to write down a short formal semantics strawman to illustrate what I have in mind.  I've put something onto the wiki here:
>>> 
>>> http://www.w3.org/2011/prov/wiki/FormalSemanticsStrawman
>>> 
>>> It's definitely not a finished product but I've made an effort to cover entity assertions, ivp/complement, process execution, and events (but NOT derivation :)
>>> 
>>> One thing that's become apparent already is that there is a large potential for confusion since we are talking about assertions about things that may change over time.  The assertions may explicitly mention time points/intervals and they may also implicitly have "assertion time" or "time intended to be valid"  associated with them. Some of the assertions in the Conceptual Model document also have explicit times associated with them (e.g. use, generation and process execution assertions.)  Others such as entity assertions do not have explicit time arguments, but the discussion surrounding them refers to time points or intervals during which the entity being described exists.
>>> 
>>> So for each kind of assertion p(x,y,z,...), it would be helpful to clarify whether:
>>> 1.  p(x,y,z,...) is something that either always holds or never holds; or
>>> 2.  p(x,y,z,...) can hold or not at a specific point in time t (there may be a convention that we can make this explicit by adding an argument, e.g. p(x,y,z,...,t)); or
>>> 3.  p(x,y,z,...) can hold or not during an interval [t1,t2] (again there may be a convention where we add 2 arguments).
>>> 
>>> Currently, there seem to be a mix of conventions.
>>> 
>>> Comments are welcome.  I'm not pretending to have read all the relevant background / mailing list discussion carefully and so I may be using terminology incorrectly.  As the name suggests, I expect this to be easy to knock down, but hope that we'll learn something in doing so anyway.
>>> 
>>> --James
>> 
>> 
> 
> 
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> 
> 
> 


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
Received on Friday, 9 September 2011 15:19:07 UTC