Re: formal semantics strawman from Stian Soiland-Reyes on 2011-09-21 (public-prov-wg@w3.org from September 2011)

From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
Date: Wed, 21 Sep 2011 12:27:42 +0100
To: James Cheney <jcheney@inf.ed.ac.uk>
Cc: "Myers, Jim" <MYERSJ4@rpi.edu>, Graham Klyne <GK@ninebynine.org>, W3C provenance WG <public-prov-wg@w3.org>
Message-ID: <CAPRnXtnEuZr3vLvQPveUBesxikO_2qbFiPn3k3UqEwfO5x7X4A@mail.gmail.com>

On Tue, Sep 20, 2011 at 22:15, James Cheney <jcheney@inf.ed.ac.uk> wrote:

> The thing it denotes *is* real (if X has a car in whatever situation we're talking about, then "X's car" denotes that car; otherwise there is ambiguity or vacuity).
> You seem not to be distinguishing between a statement one might make about a thing, and the thing itself (but perhaps I am just getting confused). See below.

Yes, but "the car" is also such a concept or statement describing some
idea of what is "the thing". If I remove the wheels and change the
engine, you and I might disagree about it still being the same car.
That is why we simply describe them all as entity, because we can't
really grasp "the real thing" because you will simply end up with yet
another characterisation/idea/concept (which attributes might or might
not be easy to express).

> I certainly do mean that "X's Toyota" - the expression - is not a thing in the same sense as the car is.

What about "A Toyota, owned by X" - compared to "A Toyota, blue" and
"A Toyota, license plate #232323"? All of these are attributes that
might or might not change, depending on the time duration and purpose
of using that particular characterisation.

> Agreed, the point is just that time is not the only context that might be needed to make sense of an expression like "X's car".

A very valid point, in particular as we start talking about more
abstract entities for information that easily can exist "two places at
the same time" or be ambiguousness about things like a file path or
content across multiple dimensions like time, location, perspective.

I believe this also allows one entity to have a certain attribute
"fixed" within a time-span, while another entity have equivalent
attributes varying over the same time-span, even though they are both
wasComplementOf a common entity. The entities consider the attribute
with a different granularity, precision, etc, but they can be
"equivalent" or "corresponding" as briefly described in the model
document.

For instance the mass-of-a-boxer attribute is seen as constant for the
purpose of a boxing match, varying for the biologist (drinking water,
processing energy from food) and uncertain for the physicist (the
boxer keeps moving around and experiences acceleration) - but they can
all agree that it is Muhammad Ali in the boxing ring for the duration
of the match.

> I am just saying that if "X's blue car in 2011" denotes a different (real) thing-over-time than "X's blue car in 2001", then it makes no sense to use the same identifier for both. The example may have hidden this point.

Ah, this is a good example of where the "real thing" seems like it's
mismatching because the entity describes something closer to a concept
or role than a certain arrangements of atom chains. In everyday life
we normally give names and classes to such arrangements because it is
more convenient than talking about the actual atom arrangements.

But if the physical presence of the car is not important, then it can
easily be the same entity, for instance because we are talking about
"The thing Luc own for the purpose of commuting". It would of course
be strange to have a physical property like colour on such an abstract
entity, but it could be OK in some circumstances, say Luc was
sponsored by W3C and always had to drive a blue car to work. Then it
does not matter that much which "physical car" it was that was "the
blue car", we can still talk about "Luc's blue car" spanning all those
years - but now we can't lock down attributes like license plate
number without doing a narrower entity with prov:wasComplementOf
:lucsBlueCar .

Similarly http://en.wikipedia.org/wiki/Back_to_the_Future:_The_Ride#Memorabilia
shows how "The De Lorean from Back to the Future" is on display - but
actually several De Lorean's were used for stunts, etc. In the view of
film fans these physical things, with or without various additions
like Mr Fusion, are all "The time-travelling De Lorean".

> All I was saying is that I have been interpreting the "id" component of an entity assertion as purely syntactic, as a placeholder for "the thing I'm talking about in this assertion", so that I can make statements about (what I believe to be) the same thing at different instants in time or different properties of it at overlapping times. The id could also happen to be a URI with some useful Web meaning, or not, and it could happen to be enough to uniquely identify the thing it denotes, or not.

I think we need to distinguish between identifiers for the purpose of
separating entities in the provenance (which need to pinpoint exactly
which entity description we're talking about) and various "other"
identifiers, which for provenance purposes are really just different
kind of attributes, exactly because it depends on who assigned the
identifier what scope and view of the entity is implied.

The danger here is that asserters would re-use existing (semantic web)
URIs for entities, although from a provenance perspective they have a
narrower view of the entity than whoever assigned the identifier.

One way around this is to always narrow down with a new, local entity
with its own URI, having :wasComplementOf <commonURI> (implicitly
saying that <commonURI> is an entity, but not describing it) - while
you seem to want a variation of this, with some kind of prov:realThing
<commonURI> property on a fresh local entity.

Another approach would be for the asserter to actually use <commonURI>
directly as a prov:Entity, but be explicit about which attributes of
<commonURI> they consider charactering ("locked
down"/immutable/invariant/important) within this provenance assertion
graph. This pushes the requirement for named graphs and/or
indirections through a prov:Asserter because different asserters can
have different views on what characterises <commonURI>. This can make
it tricky in situations where you don't know or want to specify these
attributes (or their values) at assertion time.

I believe that if we want to encourage this
reuse-existing-URI-as-entity or the always-local-entity-URI approach
is more of a practical matter (how difficult to write/query/reason)
than a big discussion of "what is the real thing" which as pointed out
we would probably never conclude.

> I am still confused whether you regard an entity as an assertion (= syntactic statement ABOUT some state of affairs) or a real thing. You seem to be saying the latter, but I can't see how to adjust the semantics to reflect this.

I believe it is the former, because whatever our language, that is the
only way we know to talk about "the real thing" - and so the
difference is or is not there depending on what you classify as "the
real thing". (You could include "identifiers" in here as another way
to talk about "the real thing" - but even [ owl:sameAs :X ] is a
statement).

> If we do want to talk about attributes that (uniquely) characterize the things they describe, we should make these assumptions explicit (e.g. for cars, state whether we consider VINs mutable or not). Perhaps the discussion happening in parallel about the use of owl:key is the way to address this.

I agree - but we have already said that entities are descriptions
about things within a certain perspective/time-frame/view, so would it
not be implied that all attributes given to them are immutable for the
purpose of that entity's use within the provenance statements?

If a property is mutable, then why is it stated for that entity
instead of on a narrower entity? (We would not know when that
attribute applies or not). In a regular old-style RDF document, if
someone says <http://soiland-reyes.com/stian#me> foaf:name "Stian
Soiland-Reyes" - then that value is assumed to be true throughout
that particular graph - although it has not been explicitly said who
asserts this, over what time, based on which observations or
assertions, etc. Similarly I believe that within a single PROV
assertion graph, if the asserter says :lucsCar :colour :blue - then
that's true for wherever we see :lucsCar within this graph.

The distinction comes when we are doing multiple asserters - would
they reuse identifiers for entities or not - which in a way is the
same discussion as with the "real thing" above.

Note, an attribute might be immutable/invariant, but not an
(important) part of our characterisation. It could very much still be
useful to include such attributes, in particular when specialising for
different domains. I believe this distinction should be individual
entities rather than using OWL keys on some (often artificial) class,
because it depends on that particular entity what is characterising or
not.

> As noted above, it is not just that it is not intuitive, it is that I (at least) do not understand what you mean by entities being real (and this seems to be a basic point of cognitive dissonance among others too). One possibility would be to just rename ?things back to "entities" and say that "entity assertions" are statements about aspects of things/entities that are fixed over a period of time. This is what I did initially and Luc asked me to rename to thing to make a clearer distinction.

Yes, our "Entity" is closer to an "EntityStatement" or "EntityState"
than an entity itself, but we did in the end vote for prov:Entity for
simplicity. Similarly our Agent is not the real agent, it is a
description/identifier for the Agent - but instead of tacking
"Statement" or "Description" behind every class name, we just admit
that our assertion language in its nature is describing other things,
and so this is made implicit.

--
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester

Received on Wednesday, 21 September 2011 11:28:41 UTC