RE: formal semantics strawman from Myers, Jim on 2011-09-15 (public-prov-wg@w3.org from September 2011)

From: Myers, Jim <MYERSJ4@rpi.edu>
Date: Thu, 15 Sep 2011 16:24:00 +0000
To: James Cheney <jcheney@inf.ed.ac.uk>
CC: Graham Klyne <GK@ninebynine.org>, W3C provenance WG <public-prov-wg@w3.org>
Message-ID: <3131E7DF4CD2D94287870F5A931EFC23030BF2@EX14MB2.win.rpi.edu>
> 
> Just catching up on this now (it has been a busy week).  I agree with what I
> understand to be Graham's larger point in referring to Frege (that we are
> probably revisiting philosophical issues that we are unlikely to resolve).

We don't need to resolve philosophical differences or revisit anything, but philosophies are essentially models of the world and we're picking a model. Philosophical discussions will give us a sense of what are model can/cannot (is good at/bad at) describing.

> 
> My starting point in the formal semantics is the common approach taken in
> mathematical logic - i.e. one postulates a "real" world / domain of interest as
> a mathematical structure.  One then introduces a language that one can use
> to talk about the elements of the domain, along with a formal description of
> how to interpret statements in the language as true or false of a given
> model.  This carries some philosophical baggage (i.e. it embodies some
> answers to the kinds of questions Frege considered that not everyone might
> agree with).
> 
> My take on entity (assertions) has been that they are statements meant to
> describe (temporally bounded) invariant properties of things, which may or
> may not uniquely identify the real things.  The things in the model have
> identity over time independent of their properties (observable attributes),
> and not necessarily tied to an explicit identifier in the syntax.
> 
> Thus, in the divergence example of "X's blue car", if at one point X's car is a
> blue Toyota and at another point a blue Ford, there are (at least) two
> individuals in the world that are referred to as "X's car" at different times.  To
> keep things consistent, we therefore need to take into account the time or
> other context in which to interpret the statement "X's car is blue" - it may be
> true both in 2000 when the car was a Toyota and in 2011 when it is a Ford, but
> the reason it is true is different, because different things are involved.

I think you're claiming that "X's car" is not a real thing. Combining that philosophical view with the idea of taking things to the molecular level to get an identity relationship for a thing implies to me that "X's Toyota" is also not a thing - it's molecular composition is changing over time.  I think this type of paradox is embedded in the philosophical view of real things with asserted states/contexts.

> Moreover, the context might also contain information that helps us
> disambiguate X (John Smith whose SSN is 123-45-6789 as opposed to all the
> other possible "John"s).

This seems like a completely different topic - John Smith is not a unique identifier.

> 
> To me, the role of the identifier in an entity assertion is just to give us a name
> (URI?) for the described thing, so that even if "X's blue car Y" doesn't
> uniquely define which thing Y maps to, we can still make assertions about Y at
> other points (possibly overlapping).  For example, we might want to say "In
> 2000, X's blue car Y had 10,000 miles on its odometer" and "In 2001, X's blue
> car Y had 20,000 miles on its odometer" and "Between 2000 and 2001, X's
> blue car Y had fewer than 100,000 miles on its odometer", but not "In 2011,
> X's blue car Y had 5,000 miles on its odometer" - we should use a different id
> to refer to the different thing denoted by "X's blue car" in 2011.

Why? If I have a database of cars keyed by license plate #, it makes perfect sense to talk about the make, model, and odometer reading at different points in time. I'm still identifying a car (license plates don't have odometers). To the state, this makes perfect sense because of the processes they care about (inspections, tickets, fees, etc.). If I care about physical processes, identifying the Ford and Toyota separately makes sense. Assuming we have a use case where the state wants to talk about the provenance of my car, how do we handle it? It appears that your answer is that they must use different identifiers.
> 
> Short of including full descriptions of the state of a real thing down to the
> molecular level (and maybe its history), I don't really see how we can fully
> enforce the idea that an entity assertion contains enough information to
> uniquely identify the referred-to thing.  This leaves aside issues such as
> containment, splitting and merging of things.
> 

And given that what you want to call things are constantly changing at the molecular level, we can't even use this method to identify things! So what are things as a function of time? (Forget entities - how do I uniquely identify a thing in the future such that everyone will agree it is still that thing?) 

Because things are under-defined and ambiguous, we need entities. Entities define only the attributes necessary for making unambiguous identifications over time and only have to do so sufficiently well to be unambiguous with respect to the processes being reported. Other aspects of their identity can remain fuzzy like things are.



> In any case, I view the strawman so far as an attempt to describe the
> common case, not fully resolve all possible corner cases - I doubt we will be
> able to handle those, without solving major open problems in philosophy.

To bring this back up to the top level - we are not trying to solve philosophical problems, but we do need to choose a model that has not been shown by philosophers (or others) to contain paradoxes for use cases we care about. Thesius' ship, a statue and its substrate, Allen Renear's argument that documents can't exist, etc. all show that a model based on some notion of 'real' things defined by common sense notions of the world, while highly intuitive, lead to paradoxes when you try to talk about provenance/evolution over time. Models that treat entities as real (per our overall discussion) largely avoid those paradoxes without introducing new ones for our cases of interest. The philosophical question of which one is really the way the world is (or what is a better model) (or what is it the world were modeling anyway) is not something we need to solve.

The cost of entities being real is that, for complex cases, the model is not intuitive to everyone. The potentially saving grace is that most people can avoid using any entities that they wouldn't be able to also interpret intuitively as things and assertions about state (with occasional paradoxes they'll just have to accept/work around).


> 
> I also understand from Luc that the strawman is no longer in sync with the
> conceptual model, so I will be revising it next week sometime to take that
> into account as well as this discussion.  One good thing to do might be to
> collect examples such as the car example, royal society, file example and try
> to explain them using the formal semantics to see where there are
> problems.

Yes - we (me included) are all too good at intuitively jumping outside what the models allow, so documenting a few examples in detail would be a good next step.

 Jim
> 
> --James
> 
> On Sep 15, 2011, at 1:38 PM, Myers, Jim wrote:
> 
> > Yes - there's definitely some overlap with the idea of sense - you can't
> define provenance irrespective of sense. It does however sound like the
> reasoning (at least as described at the Wikipedia level) is limited to the non-
> divergent case where real things that have less-real senses/context is an
> adequate approximation. If Venus was hit by a large comet and loss
> significant mass into space, it would be wrong to talk about Hesperus,
> Phosphorus, and 'remaining planet plus rapidly expanding cloud of rocks and
> gases' as three senses of the same thing that just differ by context (H&P are
> defined by whatever mass remains together at Venus' location and shines in
> the sky, the latter is really defined by the mass that accreted to form Venus,
> regardless of where it goes next).
> >
> >  Jim
> >> -----Original Message-----
> >> From: Graham Klyne [mailto:GK@ninebynine.org]
> >> Sent: Thursday, September 15, 2011 6:00 AM
> >> To: Myers, Jim
> >> Cc: James Cheney; W3C provenance WG
> >> Subject: Re: formal semantics strawman
> >>
> >> Isn't this a re-run of Frege's "Hesperus and Phosphorus"
> >> consideration of sense and reference (cf.
> >> http://en.wikipedia.org/wiki/Hesperus#.22Hesperus_is_Phosphorus.22,
> >> etc)
> >>
> >> Chasing though to (a translation of) Frege's original...
> >>
> >> "In light of this, one need have no scruples in speaking of the
> >> sense, whereas in the case of an idea one must, strictly speaking,
> >> add to whom it belongs and at what time."
> >> -- http://en.wikisource.org/wiki/On_Sense_and_Reference (para 9)
> >>
> >> Reading this, it struck me that Frege might be talking about
> >> provenance applied to the sense rather than the reference.
> >>
> >> His subsequent discussion of 'Odysseus was set ashore at Ithaca while
> >> sound asleep' seems particularly apposite to our discussion of
> >> entities; "... that anyone who seriously took the sentence to be true
> >> or false would ascribe to the name 'Odysseus' a reference, not merely
> >> a sense; for it is of the reference of the name that the predicate is
> >> affirmed or denied."  (ibid, para
> >> 15)
> >>
> >> In my mind, all this reinforces the need to be clear about the
> >> distinction between a thing, the sense (or context?) in which that
> >> thing is considered, and the assertions (provenance?) about the thing in
> that context.
> >>
> >> #g
> >> --
> >>
> >>
> >> On 10/09/2011 16:49, Myers, Jim wrote:
> >>> James,
> >>>
> >>> I think you're getting the difference in concepts correct - I think
> >>> entities are
> >> real, not assertions. I strongly believe we need the interpretation
> >> that they are real to avoid paradoxes. At this point, I don't really
> >> know how to explain that better than in other notes, but I'll try to
> >> put my arguments in the context of the thread below:
> >>>
> >>> The problem with things is that we don't actually agree about their
> >>> identity as a function of time - what 'a car owned by Luc', that
> >>> gets an ID, is, right now, is fairly obvious - we can point to it.
> >>> But whether the ID as a function of time means 'that physical car',
> >>> 'the physical material in that car whatever its future shape',
> >>> 'Luc's car whatever he happens to own at the time', 'that car in a
> >>> running condition (that's not a car, it's a giant paperweight)', or
> >>> some other variant ('that car in its current location'). Entities
> >>> are more real than things (!) because their definition includes
> >>> information about how to identify them over time.  There are common
> >>> things - like cars and files, where, by common sense, we can often
> >>> consider them entities directly ('the physical car that can move Luc
> >>> around' is the common sense entity that we usually mean when we
> >>> point at the thing). The confusing part is just that the other
> >>> definitions above are just as valid - they are formally well-
> >> defined and just as real as the common sense notion. Because of the
> >> set of processes that usually occurs to cars, the common sense entity
> >> is the most useful default, but there are things that happen to cars
> >> (motion, crashing, being bought and sold) where the other entities
> >> are needed for information about what happened to be unambiguously
> recorded.
> >>>
> >>> The alterative view - that we are just talking about asserted
> >>> constraints on a thing runs into problems when we, for example
> >>> create an entity for the 'car owned by Luc' that is only constrained
> >>> by who the owner is. That car is a Toyota at one point and a Tesla
> >>> at another (same owner). Similarly, if I constrain the car by location (e.g.
> >>> because putting a quarter in a parking meter allows 'the car at X'
> >>> to park for 20 minutes). I can make these more subtle - many of the
> >>> file examples where we constrain by content, location, set of bits
> >>> versus set of characters, etc. The problem is that over time, and
> >>> with a suitable set of processes, the different entities that all
> >>> map to a thing' at one point in time, diverge in their history. Once
> >>> they do, the view of entities as constrained things breaks. I don't
> >>> see a way to tweak this view to avoid the problem. The only argument
> >>> for this view that I think holds is that the alternative is more
> >>> complex and while the entity as assertion con
> >> straints doesn't work for as many cases, we can get by without a way
> >> to model those. I believe that this is not the case and that we
> >> already have use cases where entities diverge over time and that the
> >> restricted range where this interpretation will work is just too small to be
> useful.
> >>>
> >>> I think it would be useful to know where the group agrees/disagrees
> >>> - do
> >> others think that asserted constraints approach works for the cases I
> >> call divergent, or do they think those are not things we need to
> >> handle? Am I missing something about the asserted constraints
> >> approach that allows divergent cases to be handled in some way?
> >>>
> >>>  -- Jim
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: James Cheney [mailto:jcheney@inf.ed.ac.uk]
> >>>> Sent: Friday, September 09, 2011 11:18 AM
> >>>> To: Myers, Jim
> >>>> Cc: Graham Klyne; W3C provenance WG
> >>>> Subject: Re: [Spam:***** SpamScore] Re: formal semantics strawman
> >>>>
> >>>> Hi,
> >>>>
> >>>> There's been extensive discussion on other threads/issues
> >>>> concerning the nature of entities.  It seems to me that this issue
> >>>> remains unresolved.  I had hoped that the formal semantics strawman
> >>>> would contribute to this discussion, if we can try to formulate
> >>>> different views precisely.  I see that I did not respond to Jim
> >>>> Myers's comments below previously (I was out of town from the 28th
> >>>> until
> >> Wednesday) so am doing so now.
> >>>>
> >>>> On Aug 27, 2011, at 5:43 PM, Myers, Jim wrote:
> >>>>
> >>>>> Trying to catch up after travel. I'm not sure I have the big
> >>>>> picture but a few
> >>>> comments about statements in the thread:
> >>>>>
> >>>>> We need to avoid saying entities are fixed and ?things are not -
> >>>>> entities are
> >>>> fixed in some ways - defined by attributes with values, but they
> >>>> are mutable in other ways. ?things can be this way as well and we
> >>>> may assert entities that have the same ID as an existing ?thing -
> >>>> because that thing is already defined as fixed in the ways we need
> >>>> it to be for provenance, or we may assert entities that are
> >>>> 'complements' of ?things when we need to fix more attributes or
> >>>> different attributes than
> >> for the ?thing itself.
> >>>>
> >>>> I think one source of confusion - here and in the other discussions
> >>>> of entities - is whether an "entity (assertion)" is a syntactic,
> >>>> knowledge-modeling construct that merely represents (some subset
> >>>> of) knowledge asserted about a thing in the world, or whether it is
> >>>> something that exists in the world, alongside the ?things that are
> >>>> being
> >> described.
> >>>>
> >>>> In the strawman, I initially chose to use the term "entity" for
> >>>> both the assertions and the things in the world, which was a bit
> >>>> ambiguous and inconsistent with the conceptual model draft; after
> >>>> an offline discussion with Luc I changed it so that ?thing is used
> >>>> for the things in the world (what I'd call the semantics) and
> >>>> "entity assertion" is used for the syntactic constructs that
> >>>> contain information
> >> about the ?things.
> >>>>
> >>>> Your use of the term "entity" above hints that you consider them to
> >>>> be semantic components that exist alongside, but are subordinate to
> >>>> ?things - "sub-things" in some sense, rather than purely syntactic
> >> statements.
> >>>>
> >>>> Can you confirm this or identify parts of the strawman that do not
> >>>> match what you have in mind?
> >>>>
> >>>> I think that if we do not come to a common understanding of what
> >>>> components are "syntax" and what parts are "semantics" we will get
> >>>> hopelessly confused.
> >>>>
> >>>>>
> >>>>> The interval over which an entity exists may be different than the
> >>>>> one for
> >>>> which a complementof relationship is true.
> >>>>
> >>>> I believe this statement holds in the strawman.
> >>>>
> >>>>> If we want an asserted entity that is is the fixed content at a
> >>>>> live URL, it
> >>>> should have its own ID and a complementof relationship with the
> >>>> site URL(a ?thing and potentially another entity if we wish to
> >>>> discuss the provenance of the live site itself).
> >>>>
> >>>> In the strawman, the ids denote ?things, but the ?thing denoted by
> >>>> an id can change over time.  Both Luc (offline) and Graham
> >>>> suggested that this may be unnecessarily general.  However, if
> >>>> entities are not first-class semantic things but just statements
> >>>> about them, then your suggestion that "it should have its own ID
> >>>> and a complementOf relationship with the site url" doesn't make
> sense.
> >>>>
> >>>> Do you have a suggestion about how the strawman could change to
> >>>> accommodate your view?  For example, would it be enough to adjust
> >>>> the lookup function to associate an entity-ID with a ?thing and a time
> interval?
> >>>>
> >>>>> The complementOf relationship is true until the site content
> >>>>> updates, but
> >>>> the fixed page entity could still exist. I think this is consistent
> >>>> with the discussion but am not sure.
> >>>>
> >>>> This relates to an important issue which I noticed in doing the
> >>>> writeup: we have introduced a lot of statements that may be time
> >>>> dependent but don't have an explicit time parameter.
> >>>>
> >>>> In the strawman, I suggested several possible ways of handling this
> >>>> for isComplementOf.
> >>>>
> >>>> 1.  One can judge whether a complement relationship holds at a
> >>>> single time instant t by 2.  One can judge whether it holds over an
> >>>> interval (this is what the current conceptual model suggests).
> >>>> 3.  One can treat complementOf as a time-invariant statement, which
> >>>> holds of two id's at all time points or at no time points.  This
> >>>> would make sense, for example, if we treat id's as mapping to
> >>>> ?things (i.e. the id represents the whole history of the Royal Society).
> >>>>
> >>>>>
> >>>>> The interpretation of an entity should be time invariant - one can
> >>>>> choose to
> >>>> assert an entity that is a fixed web page or one for a a live
> >>>> website, but one should not have an entity asserted with a content
> >>>> property as part of its definition and then have that property change.
> >>>>
> >>>> In the strawman, an assertion of the form entity(id,[attr=val,...])
> >>>> is either true or false at a given time or across a given time
> >>>> interval.  I believe this is compatible with your suggestion that
> >>>> "an entity should be time invariant", but again, you seem to be
> >>>> talking about
> >> entities as semantic things, not assertions.
> >>>>
> >>>>> For the example.com example, one can define an entity that is 'the
> >>>>> content
> >>>> available from the example2.com site no matter what URL you get it
> from''
> >>>> (retrieval URL can't be an attribute here) or one that is 'the
> >>>> content retrievable from example2.com' that is generated by the
> >>>> site starting operations and ceases to exist when the site URL is
> >>>> blocked from the world/retired (retrieval URL can be an attribute
> here).
> >>>> Either is valid - the point with an entity is that you are picking
> >>>> one definition and sticking with it, using complementof when you
> >>>> need to
> >> switch definitions.
> >>>>>
> >>>>
> >>>> I don't fully understand this comment, in particular I'm not sure
> >>>> hat you mean by "one can define an entity that is..." - is this the
> >>>> same as "one can state an entity assertion describing...".
> >>>>
> >>>> Is this an example of something that you believe can be done with
> >>>> the existing conceptual model that isn't handled correctly in the
> >>>> strawman
> >> semantics?
> >>>>
> >>>> --James
> >>>>
> >>>>> Hope those are helpful  in the larger discussion (and consistent
> >>>>> with others
> >>>> interpretation!)...
> >>>>>
> >>>>> Jim
> >>>>> ________________________________________
> >>>>> From: public-prov-wg-request@w3.org [public-prov-wg-
> >> request@w3.org]
> >>>> on
> >>>>> behalf of James Cheney [jcheney@inf.ed.ac.uk]
> >>>>> Sent: Friday, August 26, 2011 7:17 AM
> >>>>> To: Graham Klyne
> >>>>> Cc: W3C provenance WG
> >>>>> Subject: [Spam:***** SpamScore] Re: formal semantics strawman
> >>>>>
> >>>>> On Aug 25, 2011, at 6:59 PM, Graham Klyne wrote:
> >>>>>
> >>>>>> James,
> >>>>>>
> >>>>>> Thanks.  This help to clarify for me some things that weren't
> >>>>>> clear to me in
> >>>> the model document.
> >>>>>>
> >>>>>
> >>>>> Note that the strawman is not necessarily capturing the intent of
> >>>>> the model
> >>>> document (it just represents my initial effort to interpret it
> >>>> formally) so might be misleading about what it was trying to say.
> >>>> For example, Luc asked me offline to change what I was calling
> >>>> "entity" to something else because it doesn't match the model.
> >>>>>
> >>>>> I've now updated the document to avoid this potential confusion
> >>>>> between the PIDM assertion "entity" and the semantic "?things".
> >>>>> (The question mark is there to flag it as a term with special
> >>>>> meaning; we should probably find a less generic term)
> >>>>>
> >>>>>> You say at
> >>>>
> >>
> http://www.w3.org/2011/prov/wiki/FormalSemanticsStrawman#Interpretin
> g
> >>>> _
> >>>> an_entity_assertion:
> >>>>>> [[
> >>>>>> Note that there is a design choice here: do we require that the
> >>>>>> entity
> >>>> associated with id be the same throughout the interval or not? I
> >>>> have chosen to require this, since otherwise the entity assertion
> >>>> doesn't seem to be about a "single entity across a time interval".
> >>>> Of course, if we require that the mapping from URIs to entities be
> >>>> time-invariant then
> >> this problem goes away.
> >>>>>> ]]
> >>>>>>
> >>>>>> As far as I can tell from a quick skim, everything else works as
> >>>>>> intended (at
> >>>> least in sections 1.3, 1.4) if the URI->Entity mapping is invariant.
> >>>> Which I think leads to a model in which the distinction between
> >>>> resource and entity (which I find to be unhelpful) becomes less
> significant.
> >>>>>>
> >>>>>
> >>>>> I was thinking of a situation where a URI is "retired" and
> >>>>> redirected to a
> >>>> different target, e.g. example.com merges with example2.com, and
> >>>> http://www.example2.com is redirected to example.com's website
> from
> >>>> then on.  Perhaps in this example http://www.example2.com is by
> >>>> definition not a URI.
> >>>>>
> >>>>> I think assuming that lookup is time-insensitive would be
> >>>>> reasonable (and
> >>>> would definitely simplify some of the definitions), but wanted to
> >>>> highlight the design choice since it seems related to things that
> >>>> have been debated on the list.  I'd rather keep things more general
> >>>> now until it's clear that there's consensus about a consistent
> >>>> picture with the other related components.  If it is then obvious
> >>>> that time-variance in the interpretation of URIs is superfluous
> >>>> then it'll be
> >> easy to eliminate it.
> >>>>>
> >>>>> --James
> >>>>>
> >>>>>
> >>>>>> #g
> >>>>>> --
> >>>>>>
> >>>>>> On 25/08/2011 18:13, James Cheney wrote:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> I've been promising for a while now to write down a short formal
> >>>> semantics strawman to illustrate what I have in mind.  I've put
> >>>> something onto the wiki here:
> >>>>>>>
> >>>>>>> http://www.w3.org/2011/prov/wiki/FormalSemanticsStrawman
> >>>>>>>
> >>>>>>> It's definitely not a finished product but I've made an effort
> >>>>>>> to cover entity assertions, ivp/complement, process execution,
> >>>>>>> and events (but NOT derivation :)
> >>>>>>>
> >>>>>>> One thing that's become apparent already is that there is a
> >>>>>>> large potential
> >>>> for confusion since we are talking about assertions about things
> >>>> that may change over time.  The assertions may explicitly mention
> >>>> time points/intervals and they may also implicitly have "assertion time"
> >>>> or "time intended to be valid"  associated with them. Some of the
> >>>> assertions in the Conceptual Model document also have explicit
> >>>> times associated with them (e.g. use, generation and process
> >>>> execution
> >>>> assertions.)  Others such as entity assertions do not have explicit
> >>>> time arguments, but the discussion surrounding them refers to time
> >>>> points
> >> or intervals during which the entity being described exists.
> >>>>>>>
> >>>>>>> So for each kind of assertion p(x,y,z,...), it would be helpful
> >>>>>>> to clarify
> >>>> whether:
> >>>>>>> 1.  p(x,y,z,...) is something that either always holds or never
> >>>>>>> holds; or 2.  p(x,y,z,...) can hold or not at a specific point
> >>>>>>> in time t (there may be a convention that we can make this
> >>>>>>> explicit by
> >>>> adding an argument, e.g. p(x,y,z,...,t)); or 3.  p(x,y,z,...) can
> >>>> hold or not during an interval [t1,t2] (again there may be a
> >>>> convention where we add 2 arguments).
> >>>>>>>
> >>>>>>> Currently, there seem to be a mix of conventions.
> >>>>>>>
> >>>>>>> Comments are welcome.  I'm not pretending to have read all the
> >>>>>>> relevant
> >>>> background / mailing list discussion carefully and so I may be
> >>>> using terminology incorrectly.  As the name suggests, I expect this
> >>>> to be easy to knock down, but hope that we'll learn something in
> >>>> doing so
> >> anyway.
> >>>>>>>
> >>>>>>> --James
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> The University of Edinburgh is a charitable body, registered in
> >>>>> Scotland, with registration number SC005336.
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> The University of Edinburgh is a charitable body, registered in
> >>>> Scotland, with registration number SC005336.
> >>>
> >>>
> >>>
> >
> 
> 
> --
> The University of Edinburgh is a charitable body, registered in Scotland, with
> registration number SC005336.
Received on Thursday, 15 September 2011 16:24:57 UTC