Re: formal semantics strawman from James Cheney on 2011-09-23 (public-prov-wg@w3.org from September 2011)

From: James Cheney <jcheney@inf.ed.ac.uk>
Date: Fri, 23 Sep 2011 16:38:48 +0100
To: "Myers, Jim" <MYERSJ4@rpi.edu>
Cc: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>, Graham Klyne <GK@ninebynine.org>, W3C provenance WG <public-prov-wg@w3.org>
Message-Id: <FC3E7259-12D3-4BA7-BCDB-34E1CE494BC2@inf.ed.ac.uk>
Hi Jim and Stian,

Thanks Stian for your detailed comments.  You seem to agree with my view on most of the questions where I thought Jim and I might disagree (e.g. that entities are statements about "real things")  So I am a little surprised that Jim agrees - I am not sure whether your and Jim's comments reflect a different view than mine that means we need to change the strawman, or simply saying the same things in different ways.

Another thing that you (Stian) said makes me want to clarify the purpose of the strawman.  You asked whether I was suggesting adding a "prov:realThing" property.  This is a separate issue  - I am definitely not suggesting this.  The "real things" in the strawman are not things that are necessarily supposed to be encoded in the data model, they do not necessarily have to be Web resources with URIs or anything else in particular.

I was also not suggesting that the class name (or data model assertion name) should be "EntityAssertion", any more than I think we should write "1 + 1 EqualityAssertion 2" in mathematics instead of "1+1 = 2".  If we're clear that the word "entity(...)" is the name of a type of assertion, then it's redundant (and distracting) to say so explicitly. 

Instead of discussing the rest point by point, can I ask whether the following statements are controversial:

1.  Entity assertions (when written down as instances of the data model) describe facts about things that are true (or at least the asserter believes to be true).

2.  Things have attributes that can change over time.

3.  Entity assertions describe attribute values that are fixed (and may be construed as identifying the thing) during the associated time interval.

4.  Entity assertions have identities that allow us to refer to / link different assertions within the data model, but may or may not be related to globally meaningful URIs.

If we agree on the above things, then I think the formal semantics strawman and data model reflect this common viewpoint as-is, and just needs to be updated to reflect the current data model draft and include illustrating examples.  If not, please suggest alternative statements that you do agree with (or changes to the semantics).

--James

On Sep 21, 2011, at 3:31 PM, Myers, Jim wrote:

> +1 to your whole set of comments!
> 
> The only minor difference I see is that I still think Entity is the right word (versus EntityState). If I push on the car example and really force someone to define when the car came into being (when a order was placed, when the parts existed, when they were assembled, only after QA testing, when bought for the first time), I would have an entity. We're using examples that are more extreme - like car-owned-by-X, because it is easier to see how they differ from an ill-defined 'car' thing, but I don't think those are really any different from car- a-set-of-parts, car-assembled, car-that-is-finished-and-accepted type entities that are closer to the common sense thing. In all cases, it is the idea that entities force you to disambiguate definitions that would otherwise lead to confusion over  things such as creation date.
> 
> Jim
> 
>> -----Original Message-----
>> From: stian@mygrid.org.uk [mailto:stian@mygrid.org.uk] On Behalf Of Stian
>> Soiland-Reyes
>> Sent: Wednesday, September 21, 2011 7:28 AM
>> To: James Cheney
>> Cc: Myers, Jim; Graham Klyne; W3C provenance WG
>> Subject: Re: formal semantics strawman
>> 
>> On Tue, Sep 20, 2011 at 22:15, James Cheney <jcheney@inf.ed.ac.uk> wrote:
>> 
>> 
>>>  The thing it denotes *is* real (if X has a car in whatever situation we're
>> talking about, then "X's car" denotes that car; otherwise there is ambiguity or
>> vacuity).
>>> You seem not to be distinguishing between a statement one might make
>> about a thing, and the thing itself (but perhaps I am just getting confused).  See
>> below.
>> 
>> Yes, but "the car" is also such a concept or statement describing some idea of
>> what is "the thing". If I remove the wheels and change the engine, you and I
>> might disagree about it still being the same car.
>> That is why we simply describe them all as entity, because we can't really
>> grasp "the real thing" because you will simply end up with yet another
>> characterisation/idea/concept (which attributes might or might not be easy to
>> express).
>> 
>> 
>>>  I certainly do mean that "X's Toyota" - the expression - is not a thing in the
>> same sense as the car is.
>> 
>> What about "A Toyota, owned by X" - compared to "A Toyota, blue" and "A
>> Toyota, license plate #232323"? All of these are attributes that might or might
>> not change, depending on the time duration and purpose of using that
>> particular characterisation.
>> 
>> 
>> 
>>> Agreed, the point is just that time is not the only context that might be
>> needed to make sense of an expression like "X's car".
>> 
>> A very valid point, in particular as we start talking about more abstract entities
>> for information that easily can exist "two places at the same time" or be
>> ambiguousness about things like a file path or content across multiple
>> dimensions like time, location, perspective.
>> 
>> I believe this also allows one entity to have a certain attribute "fixed" within a
>> time-span, while another entity have equivalent attributes varying over the
>> same time-span, even though they are both wasComplementOf a common
>> entity. The entities consider the attribute with a different granularity,
>> precision, etc, but they can be "equivalent" or "corresponding" as briefly
>> described in the model document.
>> 
>> For instance the mass-of-a-boxer attribute is seen as constant for the purpose
>> of a boxing match, varying for the biologist (drinking water, processing energy
>> from food) and uncertain for the physicist (the boxer keeps moving around
>> and experiences acceleration) - but they can all agree that it is Muhammad Ali
>> in the boxing ring for the duration of the match.
>> 
>> 
>>> I am just saying that if "X's blue car in 2011" denotes a different (real) thing-
>> over-time than "X's blue car in 2001", then it makes no sense to use the same
>> identifier for both.  The example may have hidden this point.
>> 
>> Ah, this is a good example of where the "real thing" seems like it's
>> mismatching because the entity describes something closer to a concept or
>> role than a certain arrangements of atom chains. In everyday life we normally
>> give names and classes to such arrangements because it is more convenient
>> than talking about the actual atom arrangements.
>> 
>> But if the physical presence of the car is not important, then it can easily be
>> the same entity, for instance because we are talking about "The thing Luc own
>> for the purpose of commuting". It would of course be strange to have a
>> physical property like colour on such an abstract entity, but it could be OK in
>> some circumstances, say Luc was sponsored by W3C and always had to drive a
>> blue car to work. Then it does not matter that much which "physical car" it was
>> that was "the blue car", we can still talk about "Luc's blue car" spanning all
>> those years - but now we can't lock down attributes like license plate number
>> without doing a narrower entity with prov:wasComplementOf :lucsBlueCar .
>> 
>> Similarly
>> http://en.wikipedia.org/wiki/Back_to_the_Future:_The_Ride#Memorabilia
>> shows how "The De Lorean from Back to the Future" is on display - but
>> actually several De Lorean's were used for stunts, etc. In the view of film fans
>> these physical things, with or without various additions like Mr Fusion, are all
>> "The time-travelling De Lorean".
>> 
>> 
>>> All I was saying is that I have been interpreting the "id" component of an
>> entity assertion as purely syntactic, as a placeholder for "the thing I'm talking
>> about in this assertion", so that I can make statements about (what I believe to
>> be) the same thing at different instants in time or different properties of it at
>> overlapping times.  The id could also happen to be a URI with some useful
>> Web meaning, or not, and it could happen to be enough to uniquely identify
>> the thing it denotes, or not.
>> 
>> I think we need to distinguish between identifiers for the purpose of
>> separating entities in the provenance (which need to pinpoint exactly which
>> entity description we're talking about) and various "other"
>> identifiers, which for provenance purposes are really just different kind of
>> attributes, exactly because it depends on who assigned the identifier what
>> scope and view of the entity is implied.
>> 
>> The danger here is that asserters would re-use existing (semantic web) URIs
>> for entities, although from a provenance perspective they have a narrower
>> view of the entity than whoever assigned the identifier.
>> 
>> One way around this is to always narrow down with a new, local entity with its
>> own URI, having :wasComplementOf <commonURI> (implicitly saying that
>> <commonURI> is an entity, but not describing it) - while you seem to want a
>> variation of this, with some kind of prov:realThing <commonURI> property on
>> a fresh local entity.
>> 
>> Another approach would be for the asserter to actually use <commonURI>
>> directly as a prov:Entity, but be explicit about which attributes of
>> <commonURI> they consider charactering ("locked
>> down"/immutable/invariant/important) within this provenance assertion
>> graph. This pushes the requirement for named graphs and/or indirections
>> through a prov:Asserter because different asserters can have different views
>> on what characterises <commonURI>. This can make it tricky in situations
>> where you don't know or want to specify these attributes (or their values) at
>> assertion time.
>> 
>> I believe that if we want to encourage this reuse-existing-URI-as-entity or the
>> always-local-entity-URI approach is more of a practical matter (how difficult to
>> write/query/reason) than a big discussion of "what is the real thing" which as
>> pointed out we would probably never conclude.
>> 
>> 
>>> I am still confused whether you regard an entity as an assertion (= syntactic
>> statement ABOUT some state of affairs) or a real thing.  You seem to be saying
>> the latter, but I can't see how to adjust the semantics to reflect this.
>> 
>> I believe it is the former, because whatever our language, that is the only way
>> we know to talk about "the real thing" - and so the difference is or is not there
>> depending on what you classify as "the real thing". (You could include
>> "identifiers" in here as another way to talk about "the real thing" - but even [
>> owl:sameAs :X ] is a statement).
>> 
>> 
>>> If we do want to talk about attributes that (uniquely) characterize the things
>> they describe, we should make these assumptions explicit (e.g. for cars, state
>> whether we consider VINs mutable or not).  Perhaps the discussion happening
>> in parallel about the use of owl:key is the way to address this.
>> 
>> I agree - but we have already said that entities are descriptions about things
>> within a certain perspective/time-frame/view, so would it not be implied that
>> all attributes given to them are immutable for the purpose of that entity's use
>> within the provenance statements?
>> 
>> If a property is mutable, then why is it stated for that entity instead of on a
>> narrower entity? (We would not know when that attribute applies or not). In a
>> regular old-style RDF document, if someone says <http://soiland-
>> reyes.com/stian#me> foaf:name "Stian Soiland-Reyes"  - then that value is
>> assumed to be true throughout that particular graph - although it has not been
>> explicitly said who asserts this, over what time, based on which observations
>> or assertions, etc.  Similarly I believe that within a single PROV assertion graph,
>> if the asserter says :lucsCar :colour :blue - then that's true for wherever we
>> see :lucsCar within this graph.
>> 
>> The distinction comes when we are doing multiple asserters - would they
>> reuse identifiers for entities or not - which in a way is the same discussion as
>> with the "real thing" above.
>> 
>> Note, an attribute might be immutable/invariant, but not an
>> (important) part of our characterisation. It could very much still be useful to
>> include such attributes, in particular when specialising for different domains. I
>> believe this distinction should be individual entities rather than using OWL
>> keys on some (often artificial) class, because it depends on that particular
>> entity what is characterising or not.
>> 
>> 
>> 
>>> As noted above, it is not just that it is not intuitive, it is that I (at least) do not
>> understand what you mean by entities being real (and this seems to be a basic
>> point of cognitive dissonance among others too).  One possibility would be to
>> just rename ?things back to "entities" and say that "entity assertions" are
>> statements about aspects of things/entities that are fixed over a period of
>> time.  This is what I did initially and Luc asked me to rename to thing to make a
>> clearer distinction.
>> 
>> Yes, our "Entity" is closer to an "EntityStatement" or "EntityState"
>> than an entity itself, but we did in the end vote for prov:Entity for simplicity.
>> Similarly our Agent is not the real agent, it is a description/identifier for the
>> Agent - but instead of tacking "Statement" or "Description" behind every class
>> name, we just admit that our assertion language in its nature is describing
>> other things, and so this is made implicit.
>> 
>> 
>> 
>> --
>> Stian Soiland-Reyes, myGrid team
>> School of Computer Science
>> The University of Manchester


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
Received on Friday, 23 September 2011 15:39:47 UTC