Re: simon:entity (or Identifiable) from Reza B'Far on 2011-07-16 (public-prov-wg@w3.org from July 2011)

From: Reza B'Far <reza.bfar@oracle.com>
Date: Fri, 15 Jul 2011 23:37:15 -0700
CC: public-prov-wg@w3.org
Message-ID: <4E21319B.50002@oracle.com>
Jim -

I don't disagree with anything you're saying.  I think that didn't state my 
point well.  Let me see if I can clarify and if you still think that this is 
something that has been deemed outside of the scope.  To align with your email, 
I'll use your statement:

We're debating:
  how to define this relationship
  whether the document and its versions are the same type/class in the model


What I'm suggesting is to augment Ryan's model so that:

 1. The very first version of a document is defined by an Identifier and is
    Identifiable.
 2. The DAG that I mentioned is a graph of "state" relationships where each
    state is a node.  It's directed because time is different than any other
    dimension, you can only move forward -- well, for practical purposes.  And
    it can't loop on itself -- once you modify something, it's modified, you
    can't undue it with respect to time.  It's a graph because multiple versions
    can be made of the same source without needing to merge, but merging is also
    possible.
 3. It's a DAG that has only one root because there is an inception point for
    any Bob/simon:entity/whatever.  It's created at some point and that very
    first version at the creation time is different than all the other future
    versions.  The atoms that made that thing didn't have the semantic meaning
    as a collection before it was made.

Based on this, I'm proposing that a document and its versions are the same AFTER 
the inception point.  But that there is a unique concept at the root node which 
is the thing at the inception point.  So, if you take Ryan's example, 
Identifiable and Identifier define the entity at the inception point.  But the 
graph itself is not a concept that I see anywhere, neither the nodes in the 
graph which are the states of the entity as delta changes to the previous state, 
linearly lined up in time.  I can't tell if IVPof represents the edges in the 
graph, but I think it does per everything I've read on the wiki so far... but am 
unsure.

There is no hierarchy in what I'm outlining above.  Only capturing temporal 
behavior and saying that temporal behavior is different from all the other 
dimensions since it gives rise to the notion of state and that it should be 
captured uniquely.

Example
-----------

(Legal Contract [Identifiable] at Inception Time) ---> (Modification 1 [State]) 
----> (Modification 2 [State]) ---> ....
                                                                                 |
                                                                                 
|--> (Modification 3 [State]) -----> (Modification 4 [State]) ---->.....

Regards.

On 7/15/11 4:50 PM, Myers, Jim wrote:
> This is going in the direction of a hierarchy of 'states' of an identifier? If so - I don't think we have a hierarchy. If not, then I'm not sure what the DAG represents.
>
> I remember Graham making a comment at one point about trying to write a page that talked more about the purpose of the model (as he wrote for access) - I wonder if that would help. Here's my attempt to describe the requirements and where we agree/disagree in this style. (My take could be wrong but perhaps we would make progress by identifying if we disagree on requirements or where some are debating something that others consider resolved. If so, perhaps trying to modify the text below would help before we dive back to specific points).
>
> In the following, I intend only the English meanings of words unless otherwise noted.
>
> There's a set of things we've agreed to/ignored for a while related to the basic 'inputs - process execution -outputs' where the purpose of the model is to describe the history in cases where inputs and outputs are clear and the effects of a process execution are  captured by the set of inputs and outputs (i.e. the process execution can't just change an input).
>
> We also want to be able to model cases where the process execution does change something versus just using input and generating outputs. A document with versions is one example. In that case we're making the choice to model both the document and its versions and are adding a relationship (IVPof) between then to signify that the object we consider to be changing could instead be thought of as distinct objects (document with content1 and document with content2) that can be handled by the base input-process execution-output model.
>
> We're debating:
>   how to define this relationship
>   whether the document and its versions are the same type/class in the model
>
> We also expect the model to cover a third case - where we have two different things - e.g. a document and a file - that may both have provenance, but at some point have a correspondence - the file bytes represent the document's content. This case causes problems for IVPof definitions that involve hierarchy since one can't really consider either a document or a file to be more stateful versions of the other.
>
> This again leads to debate about the definition of IVPof. So far formulation of this concept has been attempted in terms of properties and dimensions as well as in terms of 'perspective relative to processes'. Some of the debate here has been when these definitions start to include hierarchy (thus not fitting the third use case), but it may be possible to formulate all three in ways that don't require hierarchy.
>
> This last use case also makes it harder to see a difference in the types of thing like document and version. In particular, if we can imagine more than one level for the second use case (e.g. document-version-encodedVersion), or think about the third case with no hierarchy, a two class system of thing and thing-state does not appear workable.
>
> Another issue that has arisen in the discussions is how to refer to things outside the model. We have several reasons we want to do this -
>    to allow discovery of things with provenance using descriptive metadata/behavior/other context outside the model
>    to aid in the definition of IVPof, where multiple hierarchies ala TBL and the third, non-hierarchical use case make it hard not to talk about something 'real' that both things involved in an IVPof relationship are describing/representing.
>
> Throughout we have trouble with nomenclature thing/entity/stuff/etc., describe/represent/view of/etc. which helps obscure when we do/don't agree.
>
> We(I anyway) may be confusing what the model contains versus how the model will be implemented (in RDF or in other languages we think in).
>
> I don't know that this is complete, but perhaps I can stop and ask whether this is already controversial or if it captures some of the nature of our debates?
>
>   Jim
>
> -----Original Message-----
> From: public-prov-wg-request@w3.org on behalf of Reza B'Far
> Sent: Fri 7/15/2011 2:22 PM
> To: public-prov-wg@w3.org
> Subject: Re: simon:entity (or Identifiable)
>
> Folks -
>
> I realize that the "R" word has been banned and am fine with that.  Here is a
> suggestion for reconciliation of proposals/suggestions by Ryan, Jim(s), and Luc -
>
>   1. That we specify that Identifier is some "base-line" temporally identified as
>      zero point (there exist no entity to be identified before this point).
>   2. That we have a new concept that encapsulates a single "state" (sorry, I know
>      that's another dangerous word) of identifier from that point on.  I don't
>      want to give it a name so I'll call it set S{}.
>   3. An Identifier can have a DAG (Directed Acyclic Graph) of S{} nodes where the
>      DAG has a single root node and that root node has equivalence with the
>      identifier itself.
>
> Just trying to reconcile at this point.
>
>
> On 7/15/11 10:46 AM, Jim McCusker wrote:
>> On Fri, Jul 15, 2011 at 12:06 PM, Myers, Jim<MYERSJ4@rpi.edu>   wrote:
>>>> Being able to describe what the entity "looks like" at the time the
>>>> provenance was recorded.
>>>>
>>>> My understanding was that a BOB was something like a named graph,
>>> graph
>>>> literal (http://webr3.org/blog/semantic-web/rdf-named-graphs-vs-graph-
>>>> literals/),
>>>> or information artifact similar to iao:Dataset. The Bob would then
>>> have
>>>> content that described, in some way, the entity in question.
>>>> Hence the Bob being a description of an entity's state.
>>> Do you distinguish 'description of an entity' from 'description of an
>>> entity's state'? I get the sense that you are not using state in the
>>> same sense of 'a more stateful view of' that is driving the discussion
>>> of entity versus entity-state in the IVPof debates.
>> Any description of an entity will occur with an entity in a particular
>> state, and so two are the same.
>>
>>>> If it is possible to know, there should be assertions on the BOB
>>> itself that say
>>>> which entity the BOB is describing. Ideally, this is a URI of
>>> something that's
>>>> referenced within the BOB.
>>> I'm hoping someone will chime in on this - I agree we need to connect
>>> the idea of a bob with the entity, but I could see implementing that as
>>> a link (as you say) or by saying that my entity's class is a subtype of
>>> Bob (hence there's only one URL for the Bob and the entity).
>> But that's clearly wrong, since Bobs only describe the state of an
>> entity at one point/span of time and context. If the same entity is
>> observed again, and a new Bob is created that describes the state
>> differently, then there's nothing to tie it down. I'm guessing that by
>> saying there is no referable entity outside of the Bob, then you can
>> just make Bobs all the way down. But there would be no grounding to
>> non-provenance resources in this case.
>>
>> The Bob is the description of something based on its state, the Entity
>> is that something. A description of a thing is not the thing itself.
>> Within the context of information systems, one can say that
>> http://tw.rpi.edu/instances/JamesMcCusker is me. If you were to
>> download the RDF from that URL that would contain a description of me
>> within the context of RPI. The graph literal behind
>> http://tw.rpi.edu/instances/JamesMcCusker is one description (that can
>> change over time), and can be given an identifier using a graph digest
>> [1], guaranteeing that we always talk about the same graph. But that
>> graph is not me, even though the URI that returns it stands in for me
>> in the semantic web.
>>
>> [1] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1.2187&rep=rep1&type=pdf
>>
>> Jim
>> --
>> Jim McCusker
>> Programmer Analyst
>> Krauthammer Lab, Pathology Informatics
>> Yale School of Medicine
>> james.mccusker@yale.edu | (203) 785-6330
>> http://krauthammerlab.med.yale.edu
>>
>> PhD Student
>> Tetherless World Constellation
>> Rensselaer Polytechnic Institute
>> mccusj@cs.rpi.edu
>> http://tw.rpi.edu
>>
Received on Saturday, 16 July 2011 06:37:54 UTC