Re: formal semantics strawman from James Cheney on 2011-09-15 (public-prov-wg@w3.org from September 2011)

From: James Cheney <jcheney@inf.ed.ac.uk>
Date: Thu, 15 Sep 2011 15:00:53 +0100
To: "Myers, Jim" <MYERSJ4@rpi.edu>
Cc: Graham Klyne <GK@ninebynine.org>, W3C provenance WG <public-prov-wg@w3.org>
Message-Id: <957ED87C-6908-47F7-87A9-5D432D3FD675@inf.ed.ac.uk>
Hi,

Just catching up on this now (it has been a busy week).  I agree with what I understand to be Graham's larger point in referring to Frege (that we are probably revisiting philosophical issues that we are unlikely to resolve).

My starting point in the formal semantics is the common approach taken in mathematical logic - i.e. one postulates a "real" world / domain of interest as a mathematical structure.  One then introduces a language that one can use to talk about the elements of the domain, along with a formal description of how to interpret statements in the language as true or false of a given model.  This carries some philosophical baggage (i.e. it embodies some answers to the kinds of questions Frege considered that not everyone might agree with).

My take on entity (assertions) has been that they are statements meant to describe (temporally bounded) invariant properties of things, which may or may not uniquely identify the real things.  The things in the model have identity over time independent of their properties (observable attributes), and not necessarily tied to an explicit identifier in the syntax.

Thus, in the divergence example of "X's blue car", if at one point X's car is a blue Toyota and at another point a blue Ford, there are (at least) two individuals in the world that are referred to as "X's car" at different times.  To keep things consistent, we therefore need to take into account the time or other context in which to interpret the statement "X's car is blue" - it may be true both in 2000 when the car was a Toyota and in 2011 when it is a Ford, but the reason it is true is different, because different things are involved.  Moreover, the context might also contain information that helps us disambiguate X (John Smith whose SSN is 123-45-6789 as opposed to all the other possible "John"s).

To me, the role of the identifier in an entity assertion is just to give us a name (URI?) for the described thing, so that even if "X's blue car Y" doesn't uniquely define which thing Y maps to, we can still make assertions about Y at other points (possibly overlapping).  For example, we might want to say "In 2000, X's blue car Y had 10,000 miles on its odometer" and "In 2001, X's blue car Y had 20,000 miles on its odometer" and "Between 2000 and 2001, X's blue car Y had fewer than 100,000 miles on its odometer", but not "In 2011, X's blue car Y had 5,000 miles on its odometer" - we should use a different id to refer to the different thing denoted by "X's blue car" in 2011.  

Short of including full descriptions of the state of a real thing down to the molecular level (and maybe its history), I don't really see how we can fully enforce the idea that an entity assertion contains enough information to uniquely identify the referred-to thing.  This leaves aside issues such as containment, splitting and merging of things.  

In any case, I view the strawman so far as an attempt to describe the common case, not fully resolve all possible corner cases - I doubt we will be able to handle those, without solving major open problems in philosophy.

I also understand from Luc that the strawman is no longer in sync with the conceptual model, so I will be revising it next week sometime to take that into account as well as this discussion.  One good thing to do might be to collect examples such as the car example, royal society, file example and try to explain them using the formal semantics to see where there are problems.

--James

On Sep 15, 2011, at 1:38 PM, Myers, Jim wrote:

> Yes - there's definitely some overlap with the idea of sense - you can't define provenance irrespective of sense. It does however sound like the reasoning (at least as described at the Wikipedia level) is limited to the non-divergent case where real things that have less-real senses/context is an adequate approximation. If Venus was hit by a large comet and loss significant mass into space, it would be wrong to talk about Hesperus, Phosphorus, and 'remaining planet plus rapidly expanding cloud of rocks and gases' as three senses of the same thing that just differ by context (H&P are defined by whatever mass remains together at Venus' location and shines in the sky, the latter is really defined by the mass that accreted to form Venus, regardless of where it goes next).
> 
>  Jim
>> -----Original Message-----
>> From: Graham Klyne [mailto:GK@ninebynine.org]
>> Sent: Thursday, September 15, 2011 6:00 AM
>> To: Myers, Jim
>> Cc: James Cheney; W3C provenance WG
>> Subject: Re: formal semantics strawman
>> 
>> Isn't this a re-run of Frege's "Hesperus and Phosphorus" consideration of
>> sense and reference (cf.
>> http://en.wikipedia.org/wiki/Hesperus#.22Hesperus_is_Phosphorus.22, etc)
>> 
>> Chasing though to (a translation of) Frege's original...
>> 
>> "In light of this, one need have no scruples in speaking of the sense, whereas
>> in the case of an idea one must, strictly speaking, add to whom it belongs and
>> at what time."
>> -- http://en.wikisource.org/wiki/On_Sense_and_Reference (para 9)
>> 
>> Reading this, it struck me that Frege might be talking about provenance
>> applied to the sense rather than the reference.
>> 
>> His subsequent discussion of 'Odysseus was set ashore at Ithaca while sound
>> asleep' seems particularly apposite to our discussion of entities; "... that
>> anyone who seriously took the sentence to be true or false would ascribe to
>> the name 'Odysseus' a reference, not merely a sense; for it is of the
>> reference of the name that the predicate is affirmed or denied."  (ibid, para
>> 15)
>> 
>> In my mind, all this reinforces the need to be clear about the distinction
>> between a thing, the sense (or context?) in which that thing is considered, and
>> the assertions (provenance?) about the thing in that context.
>> 
>> #g
>> --
>> 
>> 
>> On 10/09/2011 16:49, Myers, Jim wrote:
>>> James,
>>> 
>>> I think you're getting the difference in concepts correct - I think entities are
>> real, not assertions. I strongly believe we need the interpretation that they
>> are real to avoid paradoxes. At this point, I don't really know how to explain
>> that better than in other notes, but I'll try to put my arguments in the context
>> of the thread below:
>>> 
>>> The problem with things is that we don't actually agree about their
>>> identity as a function of time - what 'a car owned by Luc', that gets
>>> an ID, is, right now, is fairly obvious - we can point to it. But
>>> whether the ID as a function of time means 'that physical car', 'the
>>> physical material in that car whatever its future shape', 'Luc's car
>>> whatever he happens to own at the time', 'that car in a running
>>> condition (that's not a car, it's a giant paperweight)', or some other
>>> variant ('that car in its current location'). Entities are more real
>>> than things (!) because their definition includes information about
>>> how to identify them over time.  There are common things - like cars
>>> and files, where, by common sense, we can often consider them entities
>>> directly ('the physical car that can move Luc around' is the common
>>> sense entity that we usually mean when we point at the thing). The
>>> confusing part is just that the other definitions above are just as
>>> valid - they are formally well-
>> defined and just as real as the common sense notion. Because of the set of
>> processes that usually occurs to cars, the common sense entity is the most
>> useful default, but there are things that happen to cars (motion, crashing,
>> being bought and sold) where the other entities are needed for information
>> about what happened to be unambiguously recorded.
>>> 
>>> The alterative view - that we are just talking about asserted
>>> constraints on a thing runs into problems when we, for example create
>>> an entity for the 'car owned by Luc' that is only constrained by who
>>> the owner is. That car is a Toyota at one point and a Tesla at another
>>> (same owner). Similarly, if I constrain the car by location (e.g.
>>> because putting a quarter in a parking meter allows 'the car at X' to
>>> park for 20 minutes). I can make these more subtle - many of the file
>>> examples where we constrain by content, location, set of bits versus
>>> set of characters, etc. The problem is that over time, and with a
>>> suitable set of processes, the different entities that all map to a
>>> thing' at one point in time, diverge in their history. Once they do,
>>> the view of entities as constrained things breaks. I don't see a way
>>> to tweak this view to avoid the problem. The only argument for this
>>> view that I think holds is that the alternative is more complex and
>>> while the entity as assertion con
>> straints doesn't work for as many cases, we can get by without a way to model
>> those. I believe that this is not the case and that we already have use cases
>> where entities diverge over time and that the restricted range where this
>> interpretation will work is just too small to be useful.
>>> 
>>> I think it would be useful to know where the group agrees/disagrees - do
>> others think that asserted constraints approach works for the cases I call
>> divergent, or do they think those are not things we need to handle? Am I
>> missing something about the asserted constraints approach that allows
>> divergent cases to be handled in some way?
>>> 
>>>  -- Jim
>>> 
>>> 
>>>> -----Original Message-----
>>>> From: James Cheney [mailto:jcheney@inf.ed.ac.uk]
>>>> Sent: Friday, September 09, 2011 11:18 AM
>>>> To: Myers, Jim
>>>> Cc: Graham Klyne; W3C provenance WG
>>>> Subject: Re: [Spam:***** SpamScore] Re: formal semantics strawman
>>>> 
>>>> Hi,
>>>> 
>>>> There's been extensive discussion on other threads/issues concerning
>>>> the nature of entities.  It seems to me that this issue remains
>>>> unresolved.  I had hoped that the formal semantics strawman would
>>>> contribute to this discussion, if we can try to formulate different
>>>> views precisely.  I see that I did not respond to Jim Myers's
>>>> comments below previously (I was out of town from the 28th until
>> Wednesday) so am doing so now.
>>>> 
>>>> On Aug 27, 2011, at 5:43 PM, Myers, Jim wrote:
>>>> 
>>>>> Trying to catch up after travel. I'm not sure I have the big picture
>>>>> but a few
>>>> comments about statements in the thread:
>>>>> 
>>>>> We need to avoid saying entities are fixed and ?things are not -
>>>>> entities are
>>>> fixed in some ways - defined by attributes with values, but they are
>>>> mutable in other ways. ?things can be this way as well and we may
>>>> assert entities that have the same ID as an existing ?thing - because
>>>> that thing is already defined as fixed in the ways we need it to be
>>>> for provenance, or we may assert entities that are 'complements' of
>>>> ?things when we need to fix more attributes or different attributes than
>> for the ?thing itself.
>>>> 
>>>> I think one source of confusion - here and in the other discussions
>>>> of entities - is whether an "entity (assertion)" is a syntactic,
>>>> knowledge-modeling construct that merely represents (some subset of)
>>>> knowledge asserted about a thing in the world, or whether it is
>>>> something that exists in the world, alongside the ?things that are being
>> described.
>>>> 
>>>> In the strawman, I initially chose to use the term "entity" for both
>>>> the assertions and the things in the world, which was a bit ambiguous
>>>> and inconsistent with the conceptual model draft; after an offline
>>>> discussion with Luc I changed it so that ?thing is used for the
>>>> things in the world (what I'd call the semantics) and "entity
>>>> assertion" is used for the syntactic constructs that contain information
>> about the ?things.
>>>> 
>>>> Your use of the term "entity" above hints that you consider them to
>>>> be semantic components that exist alongside, but are subordinate to
>>>> ?things - "sub-things" in some sense, rather than purely syntactic
>> statements.
>>>> 
>>>> Can you confirm this or identify parts of the strawman that do not
>>>> match what you have in mind?
>>>> 
>>>> I think that if we do not come to a common understanding of what
>>>> components are "syntax" and what parts are "semantics" we will get
>>>> hopelessly confused.
>>>> 
>>>>> 
>>>>> The interval over which an entity exists may be different than the
>>>>> one for
>>>> which a complementof relationship is true.
>>>> 
>>>> I believe this statement holds in the strawman.
>>>> 
>>>>> If we want an asserted entity that is is the fixed content at a live
>>>>> URL, it
>>>> should have its own ID and a complementof relationship with the site
>>>> URL(a ?thing and potentially another entity if we wish to discuss the
>>>> provenance of the live site itself).
>>>> 
>>>> In the strawman, the ids denote ?things, but the ?thing denoted by an
>>>> id can change over time.  Both Luc (offline) and Graham suggested
>>>> that this may be unnecessarily general.  However, if entities are not
>>>> first-class semantic things but just statements about them, then your
>>>> suggestion that "it should have its own ID and a complementOf
>>>> relationship with the site url" doesn't make sense.
>>>> 
>>>> Do you have a suggestion about how the strawman could change to
>>>> accommodate your view?  For example, would it be enough to adjust the
>>>> lookup function to associate an entity-ID with a ?thing and a time interval?
>>>> 
>>>>> The complementOf relationship is true until the site content
>>>>> updates, but
>>>> the fixed page entity could still exist. I think this is consistent
>>>> with the discussion but am not sure.
>>>> 
>>>> This relates to an important issue which I noticed in doing the
>>>> writeup: we have introduced a lot of statements that may be time
>>>> dependent but don't have an explicit time parameter.
>>>> 
>>>> In the strawman, I suggested several possible ways of handling this
>>>> for isComplementOf.
>>>> 
>>>> 1.  One can judge whether a complement relationship holds at a single
>>>> time instant t by 2.  One can judge whether it holds over an interval
>>>> (this is what the current conceptual model suggests).
>>>> 3.  One can treat complementOf as a time-invariant statement, which
>>>> holds of two id's at all time points or at no time points.  This
>>>> would make sense, for example, if we treat id's as mapping to ?things
>>>> (i.e. the id represents the whole history of the Royal Society).
>>>> 
>>>>> 
>>>>> The interpretation of an entity should be time invariant - one can
>>>>> choose to
>>>> assert an entity that is a fixed web page or one for a a live
>>>> website, but one should not have an entity asserted with a content
>>>> property as part of its definition and then have that property change.
>>>> 
>>>> In the strawman, an assertion of the form entity(id,[attr=val,...])
>>>> is either true or false at a given time or across a given time
>>>> interval.  I believe this is compatible with your suggestion that "an
>>>> entity should be time invariant", but again, you seem to be talking about
>> entities as semantic things, not assertions.
>>>> 
>>>>> For the example.com example, one can define an entity that is 'the
>>>>> content
>>>> available from the example2.com site no matter what URL you get it from''
>>>> (retrieval URL can't be an attribute here) or one that is 'the
>>>> content retrievable from example2.com' that is generated by the site
>>>> starting operations and ceases to exist when the site URL is blocked
>>>> from the world/retired (retrieval URL can be an attribute here).
>>>> Either is valid - the point with an entity is that you are picking
>>>> one definition and sticking with it, using complementof when you need to
>> switch definitions.
>>>>> 
>>>> 
>>>> I don't fully understand this comment, in particular I'm not sure hat
>>>> you mean by "one can define an entity that is..." - is this the same
>>>> as "one can state an entity assertion describing...".
>>>> 
>>>> Is this an example of something that you believe can be done with the
>>>> existing conceptual model that isn't handled correctly in the strawman
>> semantics?
>>>> 
>>>> --James
>>>> 
>>>>> Hope those are helpful  in the larger discussion (and consistent
>>>>> with others
>>>> interpretation!)...
>>>>> 
>>>>> Jim
>>>>> ________________________________________
>>>>> From: public-prov-wg-request@w3.org [public-prov-wg-
>> request@w3.org]
>>>> on
>>>>> behalf of James Cheney [jcheney@inf.ed.ac.uk]
>>>>> Sent: Friday, August 26, 2011 7:17 AM
>>>>> To: Graham Klyne
>>>>> Cc: W3C provenance WG
>>>>> Subject: [Spam:***** SpamScore] Re: formal semantics strawman
>>>>> 
>>>>> On Aug 25, 2011, at 6:59 PM, Graham Klyne wrote:
>>>>> 
>>>>>> James,
>>>>>> 
>>>>>> Thanks.  This help to clarify for me some things that weren't clear
>>>>>> to me in
>>>> the model document.
>>>>>> 
>>>>> 
>>>>> Note that the strawman is not necessarily capturing the intent of
>>>>> the model
>>>> document (it just represents my initial effort to interpret it
>>>> formally) so might be misleading about what it was trying to say.
>>>> For example, Luc asked me offline to change what I was calling
>>>> "entity" to something else because it doesn't match the model.
>>>>> 
>>>>> I've now updated the document to avoid this potential confusion
>>>>> between the PIDM assertion "entity" and the semantic "?things".
>>>>> (The question mark is there to flag it as a term with special
>>>>> meaning; we should probably find a less generic term)
>>>>> 
>>>>>> You say at
>>>> 
>> http://www.w3.org/2011/prov/wiki/FormalSemanticsStrawman#Interpreting
>>>> _
>>>> an_entity_assertion:
>>>>>> [[
>>>>>> Note that there is a design choice here: do we require that the
>>>>>> entity
>>>> associated with id be the same throughout the interval or not? I have
>>>> chosen to require this, since otherwise the entity assertion doesn't
>>>> seem to be about a "single entity across a time interval". Of course,
>>>> if we require that the mapping from URIs to entities be time-invariant then
>> this problem goes away.
>>>>>> ]]
>>>>>> 
>>>>>> As far as I can tell from a quick skim, everything else works as
>>>>>> intended (at
>>>> least in sections 1.3, 1.4) if the URI->Entity mapping is invariant.
>>>> Which I think leads to a model in which the distinction between
>>>> resource and entity (which I find to be unhelpful) becomes less significant.
>>>>>> 
>>>>> 
>>>>> I was thinking of a situation where a URI is "retired" and
>>>>> redirected to a
>>>> different target, e.g. example.com merges with example2.com, and
>>>> http://www.example2.com is redirected to example.com's website from
>>>> then on.  Perhaps in this example http://www.example2.com is by
>>>> definition not a URI.
>>>>> 
>>>>> I think assuming that lookup is time-insensitive would be reasonable
>>>>> (and
>>>> would definitely simplify some of the definitions), but wanted to
>>>> highlight the design choice since it seems related to things that
>>>> have been debated on the list.  I'd rather keep things more general
>>>> now until it's clear that there's consensus about a consistent
>>>> picture with the other related components.  If it is then obvious
>>>> that time-variance in the interpretation of URIs is superfluous then it'll be
>> easy to eliminate it.
>>>>> 
>>>>> --James
>>>>> 
>>>>> 
>>>>>> #g
>>>>>> --
>>>>>> 
>>>>>> On 25/08/2011 18:13, James Cheney wrote:
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I've been promising for a while now to write down a short formal
>>>> semantics strawman to illustrate what I have in mind.  I've put
>>>> something onto the wiki here:
>>>>>>> 
>>>>>>> http://www.w3.org/2011/prov/wiki/FormalSemanticsStrawman
>>>>>>> 
>>>>>>> It's definitely not a finished product but I've made an effort to
>>>>>>> cover entity assertions, ivp/complement, process execution, and
>>>>>>> events (but NOT derivation :)
>>>>>>> 
>>>>>>> One thing that's become apparent already is that there is a large
>>>>>>> potential
>>>> for confusion since we are talking about assertions about things that
>>>> may change over time.  The assertions may explicitly mention time
>>>> points/intervals and they may also implicitly have "assertion time"
>>>> or "time intended to be valid"  associated with them. Some of the
>>>> assertions in the Conceptual Model document also have explicit times
>>>> associated with them (e.g. use, generation and process execution
>>>> assertions.)  Others such as entity assertions do not have explicit
>>>> time arguments, but the discussion surrounding them refers to time points
>> or intervals during which the entity being described exists.
>>>>>>> 
>>>>>>> So for each kind of assertion p(x,y,z,...), it would be helpful to
>>>>>>> clarify
>>>> whether:
>>>>>>> 1.  p(x,y,z,...) is something that either always holds or never
>>>>>>> holds; or 2.  p(x,y,z,...) can hold or not at a specific point in
>>>>>>> time t (there may be a convention that we can make this explicit
>>>>>>> by
>>>> adding an argument, e.g. p(x,y,z,...,t)); or 3.  p(x,y,z,...) can
>>>> hold or not during an interval [t1,t2] (again there may be a
>>>> convention where we add 2 arguments).
>>>>>>> 
>>>>>>> Currently, there seem to be a mix of conventions.
>>>>>>> 
>>>>>>> Comments are welcome.  I'm not pretending to have read all the
>>>>>>> relevant
>>>> background / mailing list discussion carefully and so I may be using
>>>> terminology incorrectly.  As the name suggests, I expect this to be
>>>> easy to knock down, but hope that we'll learn something in doing so
>> anyway.
>>>>>>> 
>>>>>>> --James
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> The University of Edinburgh is a charitable body, registered in
>>>>> Scotland, with registration number SC005336.
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> The University of Edinburgh is a charitable body, registered in
>>>> Scotland, with registration number SC005336.
>>> 
>>> 
>>> 
> 


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
Received on Thursday, 15 September 2011 14:01:52 UTC