RE: PROV-ISSUE-1 (define-resource): Definition for concept 'Resource' [Provenance Terminology]

This issue is not just aggregation. I have an iPass and the state considers any vehicle with that iPass in it to be mine and expects me to pay tolls when it engages in drive-on-tollway events. There are times when that legal vehicle is my physical car and times when its a rental. (I.e. I would not say my car had all its parts replaced when I use a rental car).
I think we have exactly this issue in our scenarios - the legal/logical data from the goverment corresponds to different physical file objects at different times and we want to track provenance across the legal/logical and physical processes that occur. We are asking in queries whether the logical result depends on the government data even when all of the physical bits of the input used are completely different (on a different disk, perhaps in a different format) than the government data file. If we don't track the logical government data separately from physical files that at times are manifestations of it, we get paradoxes from our limited model that don't exist in the real world. (File copying preserves the logical-to-physical correspondence, editing a file to be all zeros does not, so if you have a derived result from any copy created before an edit occurred, you're result is logically dependent on the government data...).
We really have these types of logical to physical relationships throughout science as well - we assume the reading from the sensor is the logical temperature but would question that relationship if we had provenance of a 'smashed' event for the sensor. The logical temperature may at times have separate provenance from the sensor reading and we may want to track both. 
In a practical sense, I think modeling this way involves very little change to OPM-style provenance. In addition to artifact-process execution-artifact type chains, you have the occasional links that connect resources - physical file is a manifestation of the gov data - that allow you to cross from thinking about legal/logical/intellectual provenance to physical/computational provenance, etc. That 'minor' addition would avoid further discussion of how/when to categorize specific things as mutable/immutable, etc. and probably remove the need for special case opm:agent and pml:source style types as well.


From: on behalf of martin
Sent: Thu 6/2/2011 3:50 PM
Subject: Re: PROV-ISSUE-1 (define-resource): Definition for concept 'Resource' [Provenance Terminology]

I agree on this view by Paolo. It makes no sense to distinguish a "legal ship" from a "physical ship". Any physical object and any part of
it can be seen as a compound of changing parts or as a single rigid thing at the same time, at any time. I believe it is impossible to
address here the philosophical questions of substance, but we have to address which behaviour of things is relevant in our technical and
scientific world.

In the CIDOC CRM, we have therefore introduced Part Removal and Part Addition events, which are subclasses of Modification Event. So,
idependently at which level we regard an object, the car or the engine, the set of Modification Events partitions its states. In our
engineering environments, we may completely observe all such modifications, as in the logbook mentioned below. Then we know all states, and
can continue with the parts to do the same. In the CRM, we do not model states, because you can infer them from the events at any time, but
from incomplete records you infer the wrong states, which makes the KR of states non-monotonic, but collecting event information does not
invalid partial knowledge. Further, it is application dependent which details are registered and regarded as relevant. Modelling states
imposes the view of one particular application, which makes integration with another view with more or other events non-monotonic without
good reason.

As me and others said before, if we do not restrict our domain, we shall discuss the paradoxa that the real world contains for many years
to come. Clearly, such a model of discrete states under complete observation, which is what we need for technical artefacts and museum
objects etc., does not apply to all applications for the same kinds of things, and not to living organisms, geological plates and so far,
which are still pretty identifiable.

For any ontology, it is useful to have a general functional scope. An a priori restriction of our model to applications in which the
relevant factors can be captured by discrete events would be a good move. This is by the way the premise of the CIDOC CRM and the OPM
as I understand.



On 6/2/2011 5:22 PM, Paolo Missier wrote:
> James, Graham (will address Luc's comments on the wiki again separately)
> I would argue that the provenance of the car includes all the engine replacements that took place, so if the engine is now B, I would like
> to ask the question "why is B here?" and receive an answer like "B has replaced A [at time t] [because A failed...]". I believe James hinted
> at this. And if you are interested, you go back and unfold the history of A. So yes, the provenance of A is still part of the car's
> provenance, in the car's current state -- the logbook of car repairs that you get from your garage is a simple example.
> All I meant to say is that history is cumulative and immutable. That is not to say it's linear. Someone else (sorry, mail chaos at this
> point) commented that it is a DAG, and I would agree without having thought too hard (which I never do :-)).
>    The issue of scoping / avoiding the big bang problem is addressed separately: you may decide to prune the early episodes in history for
> convenience, engineering issues, etc., and for most resources (whatever your definition), there is some kind of origin. It's often relative
> to the observer (as is all provenance): consumers generally don't need to investigate where the engine's materials come from, whereas a
> forensic epert investigating an engine failure may.
> In my view, Theseus's ship is the result of all the actions that were ever taken on it, including the destructive ones. Too radical?
> --Paolo
> On 6/2/11 12:41 PM, James Cheney wrote:
>> Yes, these issues seem intuitive only as long as you don't stop to think about them too hard :)
>> I would say that the provenance has to be scoped by (say) a start and end time, or some other criterion, to prevent the "big bang" problem
>> (see e.g. [Miles IPAW 2006]).
>> If we want the provenance of the car from "now" until it was made, then the provenance of A needs to be included (e.g., maybe A caused
>> damage to the car when it failed, so we need to know that to understand how the car's current state was obtained from its initial state).
>> If we want the provenance of the car from "now" until I bought it, which happened after the engine was replaced, then maybe I don't need
>> to know about A.  (If I want to buy the car, I'd probably value the knowledge of the earlier history so that I can understand its current
>> state, but the seller isn't always obligated to provide this.)
>> This reminds me of another good story:
>> /The ship wherein Theseus <> and the youth of Athens <> returned
>> [from Crete <>] had thirty oars, and was preserved by the Athenians down even to the time of Demetrius
>> Phalereus <>, for they took away the old planks as they decayed, putting in new and
>> stronger timber in their place, insomuch that this ship became a standing example among the philosophers
>> <>, for the logical question of things that grow; one side holding that the ship remained the
>> same, and the other contending that it was not the same./
>> -Plutarch, /Theseus []/
>> /
>> /
>> What is the provenance of the ship?  Was the ship really "preserved"?
>> --James
>> On Jun 2, 2011, at 12:05 PM, Graham Klyne wrote:
>>> I think Paolo has usefully threaded a path through our discussions.  Thanks!  At first reading, I would consent (in the sense of
>>> "consensus") to definitions framed on the basis of what he has written here.
>>> ...
>>> The issue of monotonicity (of provenance of a stateful resource) is interesting. Intuitively, it seems appropriate, but I'd need to let
>>> it stew awhile before accepting it unconditionally.  My immediate concern is how do we account for correction of previous errors in
>>> provenance claims?  But this question goes to the heart of what is, IMO, one of the key purposes of provenance on the Web (i.e. to help
>>> deal with conflicting information in the Web, and the Semantic Web in particular), so maybe that point gets addressed separately in any case.
>>> Aha!  I just thought of another example:  suppose we're talking about provenance of a car (e.g. for QA purposes).  Initially, suppose it
>>> has engine A, made by a particular factory.  The provenance of the car include the provenance of engine A.  Sometime in its life, the
>>> engine fails and is replaced by engine B, and provenance of engine C becomes part of the car's provenance.  At this point, does it make
>>> sense to claim that the provenance of A is still part of the car's provenance?  A similar example could be constructed for, say, a photo
>>> album where images are added and removed.
>>> #g
>>> --


  Dr. Martin Doerr              |  Vox:+30(2810)391625        |
  Research Director             |  Fax:+30(2810)391638        |
                                |  Email: |
                Center for Cultural Informatics               |
                Information Systems Laboratory                |
                 Institute of Computer Science                |
    Foundation for Research and Technology - Hellas (FORTH)   |
  Vassilika Vouton,P.O.Box1385,GR71110 Heraklion,Crete,Greece |
          Web-site:               |

Received on Thursday, 2 June 2011 22:16:20 UTC