Re: formal semantics strawman from Myers, Jim on 2011-09-10 (public-prov-wg@w3.org from September 2011)

From: Myers, Jim <MYERSJ4@rpi.edu>
Date: Sat, 10 Sep 2011 15:49:58 +0000
To: James Cheney <jcheney@inf.ed.ac.uk>
CC: Graham Klyne <GK@ninebynine.org>, W3C provenance WG <public-prov-wg@w3.org>
Message-ID: <3131E7DF4CD2D94287870F5A931EFC230198B0@EX14MB2.win.rpi.edu>
James,

I think you're getting the difference in concepts correct - I think entities are real, not assertions. I strongly believe we need the interpretation that they are real to avoid paradoxes. At this point, I don't really know how to explain that better than in other notes, but I'll try to put my arguments in the context of the thread below:

The problem with things is that we don't actually agree about their identity as a function of time - what 'a car owned by Luc', that gets an ID, is, right now, is fairly obvious - we can point to it. But whether the ID as a function of time means 'that physical car', 'the physical material in that car whatever its future shape', 'Luc's car whatever he happens to own at the time', 'that car in a running condition (that's not a car, it's a giant paperweight)', or some other variant ('that car in its current location'). Entities are more real than things (!) because their definition includes information about how to identify them over time.  There are common things - like cars and files, where, by common sense, we can often consider them entities directly ('the physical car that can move Luc around' is the common sense entity that we usually mean when we point at the thing). The confusing part is just that the other definitions above are just as valid - they are formally well-defined and just as real as the common sense notion. Because of the set of processes that usually occurs to cars, the common sense entity is the most useful default, but there are things that happen to cars (motion, crashing, being bought and sold) where the other entities are needed for information about what happened to be unambiguously recorded.

The alterative view - that we are just talking about asserted constraints on a thing runs into problems when we, for example create an entity for the 'car owned by Luc' that is only constrained by who the owner is. That car is a Toyota at one point and a Tesla at another (same owner). Similarly, if I constrain the car by location (e.g. because putting a quarter in a parking meter allows 'the car at X' to park for 20 minutes). I can make these more subtle - many of the file examples where we constrain by content, location, set of bits versus set of characters, etc. The problem is that over time, and with a suitable set of processes, the different entities that all map to a thing' at one point in time, diverge in their history. Once they do, the view of entities as constrained things breaks. I don't see a way to tweak this view to avoid the problem. The only argument for this view that I think holds is that the alternative is more complex and while the entity as assertion constraints doesn't work for as many cases, we can get by without a way to model those. I believe that this is not the case and that we already have use cases where entities diverge over time and that the restricted range where this interpretation will work is just too small to be useful.

I think it would be useful to know where the group agrees/disagrees - do others think that asserted constraints approach works for the cases I call divergent, or do they think those are not things we need to handle? Am I missing something about the asserted constraints approach that allows divergent cases to be handled in some way? 

 -- Jim


> -----Original Message-----
> From: James Cheney [mailto:jcheney@inf.ed.ac.uk]
> Sent: Friday, September 09, 2011 11:18 AM
> To: Myers, Jim
> Cc: Graham Klyne; W3C provenance WG
> Subject: Re: [Spam:***** SpamScore] Re: formal semantics strawman
> 
> Hi,
> 
> There's been extensive discussion on other threads/issues concerning the
> nature of entities.  It seems to me that this issue remains unresolved.  I had
> hoped that the formal semantics strawman would contribute to this
> discussion, if we can try to formulate different views precisely.  I see that I did
> not respond to Jim Myers's comments below previously (I was out of town
> from the 28th until Wednesday) so am doing so now.
> 
> On Aug 27, 2011, at 5:43 PM, Myers, Jim wrote:
> 
> > Trying to catch up after travel. I'm not sure I have the big picture but a few
> comments about statements in the thread:
> >
> > We need to avoid saying entities are fixed and ?things are not - entities are
> fixed in some ways - defined by attributes with values, but they are mutable in
> other ways. ?things can be this way as well and we may assert entities that
> have the same ID as an existing ?thing - because that thing is already defined
> as fixed in the ways we need it to be for provenance, or we may assert
> entities that are 'complements' of ?things when we need to fix more
> attributes or different attributes than for the ?thing itself.
> 
> I think one source of confusion - here and in the other discussions of entities -
> is whether an "entity (assertion)" is a syntactic, knowledge-modeling construct
> that merely represents (some subset of) knowledge asserted about a thing in
> the world, or whether it is something that exists in the world, alongside the
> ?things that are being described.
> 
> In the strawman, I initially chose to use the term "entity" for both the
> assertions and the things in the world, which was a bit ambiguous and
> inconsistent with the conceptual model draft; after an offline discussion with
> Luc I changed it so that ?thing is used for the things in the world (what I'd call
> the semantics) and "entity assertion" is used for the syntactic constructs that
> contain information about the ?things.
> 
> Your use of the term "entity" above hints that you consider them to be
> semantic components that exist alongside, but are subordinate to ?things -
> "sub-things" in some sense, rather than purely syntactic statements.
> 
> Can you confirm this or identify parts of the strawman that do not match what
> you have in mind?
> 
> I think that if we do not come to a common understanding of what
> components are "syntax" and what parts are "semantics" we will get
> hopelessly confused.
> 
> >
> > The interval over which an entity exists may be different than the one for
> which a complementof relationship is true.
> 
> I believe this statement holds in the strawman.
> 
> > If we want an asserted entity that is is the fixed content at a live URL, it
> should have its own ID and a complementof relationship with the site URL(a
> ?thing and potentially another entity if we wish to discuss the provenance of
> the live site itself).
> 
> In the strawman, the ids denote ?things, but the ?thing denoted by an id can
> change over time.  Both Luc (offline) and Graham suggested that this may be
> unnecessarily general.  However, if entities are not first-class semantic things
> but just statements about them, then your suggestion that "it should have its
> own ID and a complementOf relationship with the site url" doesn't make
> sense.
> 
> Do you have a suggestion about how the strawman could change to
> accommodate your view?  For example, would it be enough to adjust the
> lookup function to associate an entity-ID with a ?thing and a time interval?
> 
> > The complementOf relationship is true until the site content updates, but
> the fixed page entity could still exist. I think this is consistent with the
> discussion but am not sure.
> 
> This relates to an important issue which I noticed in doing the writeup: we
> have introduced a lot of statements that may be time dependent but don't
> have an explicit time parameter.
> 
> In the strawman, I suggested several possible ways of handling this for
> isComplementOf.
> 
> 1.  One can judge whether a complement relationship holds at a single time
> instant t by 2.  One can judge whether it holds over an interval (this is what
> the current conceptual model suggests).
> 3.  One can treat complementOf as a time-invariant statement, which holds of
> two id's at all time points or at no time points.  This would make sense, for
> example, if we treat id's as mapping to ?things (i.e. the id represents the
> whole history of the Royal Society).
> 
> >
> > The interpretation of an entity should be time invariant - one can choose to
> assert an entity that is a fixed web page or one for a a live website, but one
> should not have an entity asserted with a content property as part of its
> definition and then have that property change.
> 
> In the strawman, an assertion of the form entity(id,[attr=val,...]) is either true
> or false at a given time or across a given time interval.  I believe this is
> compatible with your suggestion that "an entity should be time invariant", but
> again, you seem to be talking about entities as semantic things, not assertions.
> 
> > For the example.com example, one can define an entity that is 'the content
> available from the example2.com site no matter what URL you get it from''
> (retrieval URL can't be an attribute here) or one that is 'the content
> retrievable from example2.com' that is generated by the site starting
> operations and ceases to exist when the site URL is blocked from the
> world/retired (retrieval URL can be an attribute here). Either is valid - the
> point with an entity is that you are picking one definition and sticking with it,
> using complementof when you need to switch definitions.
> >
> 
> I don't fully understand this comment, in particular I'm not sure hat you mean
> by "one can define an entity that is..." - is this the same as "one can state an
> entity assertion describing...".
> 
> Is this an example of something that you believe can be done with the existing
> conceptual model that isn't handled correctly in the strawman semantics?
> 
> --James
> 
> > Hope those are helpful  in the larger discussion (and consistent with others
> interpretation!)...
> >
> > Jim
> > ________________________________________
> > From: public-prov-wg-request@w3.org [public-prov-wg-request@w3.org]
> on
> > behalf of James Cheney [jcheney@inf.ed.ac.uk]
> > Sent: Friday, August 26, 2011 7:17 AM
> > To: Graham Klyne
> > Cc: W3C provenance WG
> > Subject: [Spam:***** SpamScore] Re: formal semantics strawman
> >
> > On Aug 25, 2011, at 6:59 PM, Graham Klyne wrote:
> >
> >> James,
> >>
> >> Thanks.  This help to clarify for me some things that weren't clear to me in
> the model document.
> >>
> >
> > Note that the strawman is not necessarily capturing the intent of the model
> document (it just represents my initial effort to interpret it formally) so might
> be misleading about what it was trying to say.  For example, Luc asked me
> offline to change what I was calling "entity" to something else because it
> doesn't match the model.
> >
> > I've now updated the document to avoid this potential confusion
> > between the PIDM assertion "entity" and the semantic "?things".  (The
> > question mark is there to flag it as a term with special meaning; we
> > should probably find a less generic term)
> >
> >> You say at
> http://www.w3.org/2011/prov/wiki/FormalSemanticsStrawman#Interpreting_
> an_entity_assertion:
> >> [[
> >> Note that there is a design choice here: do we require that the entity
> associated with id be the same throughout the interval or not? I have chosen
> to require this, since otherwise the entity assertion doesn't seem to be about
> a "single entity across a time interval". Of course, if we require that the
> mapping from URIs to entities be time-invariant then this problem goes away.
> >> ]]
> >>
> >> As far as I can tell from a quick skim, everything else works as intended (at
> least in sections 1.3, 1.4) if the URI->Entity mapping is invariant.  Which I think
> leads to a model in which the distinction between resource and entity (which I
> find to be unhelpful) becomes less significant.
> >>
> >
> > I was thinking of a situation where a URI is "retired" and redirected to a
> different target, e.g. example.com merges with example2.com, and
> http://www.example2.com is redirected to example.com's website from then
> on.  Perhaps in this example http://www.example2.com is by definition not a
> URI.
> >
> > I think assuming that lookup is time-insensitive would be reasonable (and
> would definitely simplify some of the definitions), but wanted to highlight the
> design choice since it seems related to things that have been debated on the
> list.  I'd rather keep things more general now until it's clear that there's
> consensus about a consistent picture with the other related components.  If it
> is then obvious that time-variance in the interpretation of URIs is superfluous
> then it'll be easy to eliminate it.
> >
> > --James
> >
> >
> >> #g
> >> --
> >>
> >> On 25/08/2011 18:13, James Cheney wrote:
> >>> Hi,
> >>>
> >>> I've been promising for a while now to write down a short formal
> semantics strawman to illustrate what I have in mind.  I've put something onto
> the wiki here:
> >>>
> >>> http://www.w3.org/2011/prov/wiki/FormalSemanticsStrawman
> >>>
> >>> It's definitely not a finished product but I've made an effort to
> >>> cover entity assertions, ivp/complement, process execution, and
> >>> events (but NOT derivation :)
> >>>
> >>> One thing that's become apparent already is that there is a large potential
> for confusion since we are talking about assertions about things that may
> change over time.  The assertions may explicitly mention time points/intervals
> and they may also implicitly have "assertion time" or "time intended to be
> valid"  associated with them. Some of the assertions in the Conceptual Model
> document also have explicit times associated with them (e.g. use, generation
> and process execution assertions.)  Others such as entity assertions do not
> have explicit time arguments, but the discussion surrounding them refers to
> time points or intervals during which the entity being described exists.
> >>>
> >>> So for each kind of assertion p(x,y,z,...), it would be helpful to clarify
> whether:
> >>> 1.  p(x,y,z,...) is something that either always holds or never
> >>> holds; or 2.  p(x,y,z,...) can hold or not at a specific point in
> >>> time t (there may be a convention that we can make this explicit by
> adding an argument, e.g. p(x,y,z,...,t)); or 3.  p(x,y,z,...) can hold or not during
> an interval [t1,t2] (again there may be a convention where we add 2
> arguments).
> >>>
> >>> Currently, there seem to be a mix of conventions.
> >>>
> >>> Comments are welcome.  I'm not pretending to have read all the relevant
> background / mailing list discussion carefully and so I may be using
> terminology incorrectly.  As the name suggests, I expect this to be easy to
> knock down, but hope that we'll learn something in doing so anyway.
> >>>
> >>> --James
> >>
> >>
> >
> >
> > --
> > The University of Edinburgh is a charitable body, registered in
> > Scotland, with registration number SC005336.
> >
> >
> >
> 
> 
> --
> The University of Edinburgh is a charitable body, registered in Scotland, with
> registration number SC005336.
Received on Saturday, 10 September 2011 15:50:39 UTC