RE: updates to PAQ doc for discussion from Myers, Jim on 2011-08-14 (public-prov-wg@w3.org from August 2011)

From: Myers, Jim <MYERSJ4@rpi.edu>
Date: Sun, 14 Aug 2011 15:39:33 +0000
To: Simon Miles <simon.miles@kcl.ac.uk>, Provenance Working Group WG <public-prov-wg@w3.org>
Message-ID: <3131E7DF4CD2D94287870F5A931EFC23D416@EX14MB2.win.rpi.edu>
Simon,

I probably should have said every resource is an entity or PE... :-)

My statement about everything being an Entity in this thread was in the context of the thing versus state of things debate which I think is the distinction PAQ was making. Resource on the web that look mutable can be entities(live web pages), resources that look frozen (a version of that page) can be entities, entities made up for provenance purposes are clearly entities  and could be resources (they are not a different type just because they are only tracked for provenance purposes).

I would still argue that it will be clearer if we make Entities and PEs distinct, rather than PE being a subtype of entity, and would just extend to say every resource is an entity or PE to stay consistent. In analogy with the idea above, I would argue that if there's a web resource that is an event, it qualifies as a PE and we shouldn't need a target-like indirection mechanism to get to something in the pil model. We may still want to describe sub-events, or entities that participate in the event etc. that are not currently-existing resources, but the original resource fits the model and the pil relationships are sufficient to relate everything.  

The discussion in [1] is from a different angle, but I think it is still consistent. That one was discussing avoiding things like functions being required to determine identity, e.g. for Java object instances. The point there was to avoid situations in which PIL/PAQ had to be aware of functions or other mechanisms to allow discovery and I think I was basically arguing that one should do something like what the resource mechanism does - in the semantic web sense a resource with a URI represents something in the world and limits what you can retrieve about that thing to content and metadata. For example, content size is really a function of the content, but is presented as metadata. pil:entity should work like that. If it does, I don't see any technical reason why a resource can't have metadata you want for an entity or vice versa how you could create an entity that would not be a viable resource. As before, this doesn't say that there aren't social/practical reasons that someone owning a URL refuses to serve metadata, or that all entities will be things one wants to serve as resources. So indirection to/from existing resources in the world is useful, but, if you agree that the resources you want to map to pil:entities are also valid entities, a target-URI type mechanism now has a domain and range of pil:entity and we should be asking if target is a sub-type of IVPof, derivation, etc. versus a new concept coming in through PAQ. (If we're talking about things without a URI, I think I;m just arguing to use the semantic web notion of representing it with a resource and using that as the pil:entity, versus having a PIL-specific mechanism.)

I hope that makes sense - I still feel like things are self-consistent...

 Jim


________________________________________
From: public-prov-wg-request@w3.org [public-prov-wg-request@w3.org] on behalf of Simon Miles [simon.miles@kcl.ac.uk]
Sent: Sunday, August 14, 2011 6:59 AM
To: Provenance Working Group WG
Subject: Re: updates to PAQ doc for discussion

Hello Paul, Jim, all,

Jim:
> I think everything is a pil:Entity!

I am in favour of this view, but I didn't think you were... If it is
true, doesn't it mean that, in the formal model, any distinction
between pil:Entity and owl:Thing is misleading, and so one should be
mapped to the other? I had the (possibly mistaken) impression from
other discussions [1, 2] that you disagreed with the consequences of
this and so a pil:Entity had something special about it. In which
case, Paul may argue that some resources may not have these special
characteristics and we need separate "targets". I'm happy to go with
what the group agrees, but I don't think the definition of entity in
the model expresses how general a class Entity is, which appears to
have consequences for the PAQ.

Paul:
> I still think there's a case for allowing a target-uri to be specified when you don't want to put the URL of the resource in the provenance. For example, many sites have long urls for implementation purposes but may want to describe provenance in terms of a "better" URL e.g. A permalink.

I agree this is an important reason for having the target URI
(somewhere) - even if a resource itself is an entity and ivpOf
relations can be expressed in the provenance data, the identifiers
that clients have for it may not be the ones used in provenance for
whatever reasons, or the client may not have an identifier at all, as
discussed in issue 46 [3]

thanks,
Simon

[1] http://lists.w3.org/Archives/Public/public-prov-wg/2011Jul/0111.html
[2] http://lists.w3.org/Archives/Public/public-prov-wg/2011Aug/0017.html
[3] http://www.w3.org/2011/prov/track/issues/46


> At any rate, I think this is a better way to describe the PAQ without getting involved in the model.
>
> We'll see what Graham and others think.
>
> Cheers
> Paul
>
> On Aug 12, 2011, at 15:32, "Myers, Jim" <MYERSJ4@rpi.edu> wrote:
>
>> Paul,
>>
>> I think everything is a pil:Entity! Nominally a living page could have
>> direct provenance - when did it first appear, who approved it getting
>> added to the overall site, when did it get downloaded,  used in a backup
>> process, etc. Just because we have an open world and we (some asserter)
>> may not have provenance to directly associate with it doesn't mean it is
>> not/can't be a pil:Entity. To look at it backwards, if IVPOf fits the
>> need, why would you not want to consider the living page to be a
>> pil:Entity.
>>
>> With everything being able to be a pil:Entity, I think in the following
>> way: For resource X, if I want to talk about aspects of it that are
>> immutable, I directly associate provenance statements with it via used,
>> generatedby, derived. If I want to talk about its mutable aspects, I
>> create additional characterizations (e.g. versions for content) -
>> additional pil:Entitities that may also already be resources themselves
>> or may just be being invented/defined for provenance purposes (e.g. if I
>> am not already tracking versions of my live page as part of my site
>> operations, I identify them just for provenance purposes so I can talk
>> about when each version was created, read, etc.) and associate them with
>> the original via IVPof relationships and then use used/generatedby on
>> the characterizations. If X is really just the context or is controlling
>> some other process we have agent and participation.
>>
>> Jim
>>
>>> -----Original Message-----
>>> From: Paul Groth [mailto:pgroth@gmail.com] On Behalf Of Paul Groth
>>> Sent: Friday, August 12, 2011 2:13 AM
>>> To: Myers, Jim
>>> Cc: Khalid Belhajjame; public-prov-wg@w3.org
>>> Subject: Re: updates to PAQ doc for discussion
>>>
>>> Hi Jim,
>>>
>>> "the targetURI discussion is about relating the living page to its
>> versions which
>>> then have provenance"
>>>
>>> that's a fairly good summary.
>>>
>>> Can you clarify that Complement Of (was IVPof) works on things that
>> are not
>>> pil:Entities? I thought it only applies to pil:Entity?
>>>
>>> thanks,
>>> Paul
>>>
>>>
>>>
>>>
>>> Myers, Jim wrote:
>>>>> Now, if one says that every resource is  a pil:Entity, we may not
>>>>> need
>>>> this
>>>>
>>>> That, or that every pil:Entity can be a resource (or both). As
>> before
>>>> if I have a living web page with some URL, it may have different
>>>> versions that have different (but related) provenance. If I
>> understand
>>>> correctly, the targetURI discussion is about relating the living
>> page
>>>> to its versions which then have provenance (it also makes the
>>>> assumption that there are resources that don't have any direct
>>>> provenance - all the provenance is associated with versions or other
>>>> things that are IVPsOf the resource). I'm pointing out that each
>>>> version is a valid web resource as well (could be given its own URI)
>>>> so we don't have to treat it as a different class of thing, and that
>>>> just because we don't have direct provenance for a resource doesn't
>>>> mean it isn't a valid pil:entity.
>>>>
>>>> With the IVPof relation, we still have the mechanism to relate the
>>>> version resources with the living webpage resource, so we don't lose
>>>> any expressivity from what's in the PAQ doc. I think it just shifts
>>>> the discussion from targets as a separate type to PIL describing the
>>>> provenance of resources and having the capability to capture the
>>>> situation where some/all of the known provenance is associated with
>>>> specific version resources or other types of resources that
>> partially
>>>> characterize the resource.
>>>>
>>>>  Jim
>>>>
>>>>> -----Original Message-----
>>>>> From: Paul Groth [mailto:pgroth@gmail.com] On Behalf Of Paul Groth
>>>>> Sent: Thursday, August 11, 2011 2:01 PM
>>>>> To: Myers, Jim
>>>>> Cc: Khalid Belhajjame; public-prov-wg@w3.org
>>>>> Subject: Re: updates to PAQ doc for discussion
>>>>>
>>>>> Hi Jim, Khalid:
>>>>>
>>>>> In the model, provenance is described with respect to pil:Entities.
>>>>> In
>>>> the PAQ
>>>>> document, we describe access primarily with respect to the Web
>>>> Architecture.
>>>>> It may be the case that the resource (e.g. a web page) is a
>>>> pil:Entity. If so, then
>>>>> the access approach says go ahead and use the url of that resource
>> to
>>>> find the
>>>>> provenance of it within an identified set of provenance
>> information.
>>>>>
>>>>> However, it may be the case that the resource is not a pil:Entity.
>> In
>>>> that case,
>>>>> we provide a mechanism (Target-URIs) that let you associate the
>>>> resource to a
>>>>> pil:Entity (the target) such that you can identify a
>> characterization
>>>> of the
>>>>> resource and thus find it in some provenance provenance
>> information.
>>>>>
>>>>> This approach also lets you have multiple pil:Entities associated
>>>>> with
>>>> a
>>>>> particular resource.
>>>>>
>>>>> We are just rying to find a simple way to let the accessor know
>> when
>>>> they get
>>>>> some provenance information what they should be looking for within
>>>> that
>>>>> provenance information.
>>>>>
>>>>> Now, if one says that every resource is  a pil:Entity, we may not
>>>>> need
>>>> this. Is
>>>>> that what you're saying? and can you explain how this is the case?
>>>>>
>>>>> I hope this clarifies what we are trying to enable.
>>>>>
>>>>> Paul
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Myers, Jim wrote:
>>>>>> I think the gist of the discussion on the modeling side lately and
>>>> the
>>>>>> decision to have 'only Bobs' would shift this towards just talking
>>>>>> about the link between provenance and resources with the model
>> then
>>>>>> having a mechanism to indicate when some resources are views of
>>>>>> others, i.e. one URI is the page content on a given date and the
>>>> other
>>>>>> URI is the live page, but both are resources that can have
>>>> provenance,
>>>>>> and their provenance can contain links that indicate their
>>>> relationship.
>>>>>> Jim
>>>>>>
>>>>>> *From:*public-prov-wg-request@w3.org
>>>>>> [mailto:public-prov-wg-request@w3.org] *On Behalf Of *Khalid
>>>>>> Belhajjame
>>>>>> *Sent:* Thursday, August 11, 2011 10:13 AM
>>>>>> *To:* Paul Groth
>>>>>> *Cc:* public-prov-wg@w3.org
>>>>>> *Subject:* Re: updates to PAQ doc for discussion
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> My main concern reading sections 1 and 3, is the use of both
>>>> resource
>>>>>> and target entity. I understand that the idea is that a web
>>>> resources
>>>>>> may be associated with multiple target entities, and that there is
>> a
>>>>>> need to identify which target the provenance describes. However,
>>>>>> having to go through the two levels resource then entity is a bit
>>>>>> confusing, specially for a reader is not aware of the discussions
>>>> that
>>>>>> we had about the two concepts.
>>>>>>
>>>>>> Suggestion: Would it be really bad if we confine ourselves to the
>>>>>> provenance vocabulary and describe how the provenance of an
>> Entity,
>>>> as
>>>>>> opposed to a resource, can be accessed?
>>>>>>
>>>>>> Other comments:
>>>>>>
>>>>>> - In the definition of a resource, it said that "a resource may be
>>>>>> associated with multiple targets". It would be good if we could
>>>>>> clarify this relationship a bit more.
>>>>>>
>>>>>> - I find the definition of provenance information a bit vague, the
>>>>>> body of the definition says pretty much the same thing as the
>> title
>>>> of
>>>>>> the definition. If we don't have a better idea of what can be
>> said,
>>>> it
>>>>>> is probably better to remove it.
>>>>>>
>>>>>> In Section 3, Second paragraph, "Once provenance information
>>>>>> information" ->  "once provenance information"
>>>>>>
>>>>>> In the same paragraph: "one needs how to identify" ->  "one needs
>> to
>>>>>> know how to identify".
>>>>>>
>>>>>> Khalid
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 10/08/2011 20:37, Paul Groth wrote:
>>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> Graham and I have been making some changes to the PAQ document [1]
>>>>>> that we would like to request feedback on at tomorrow's telecon.
>>>>>>
>>>>>> In particular, we have updated Sections 1 and 3. We've added a
>>>> section
>>>>>> on core concepts and made section 3 reflect these concepts. We
>> think
>>>>>> this may address PROV-ISSUE-46 [2].
>>>>>>
>>>>>> Please take a look and let us know what you think.
>>>>>>
>>>>>> Thanks,
>>>>>> Paul
>>>>>>
>>>>>> Note: Section 4 Provenance discovery service is still under heavy
>>>>>> editing
>>>>>>
>>>>>>
>>>>>> [1]
>>>>>>
>>>>
>> http://dvcs.w3.org/hg/prov/raw-file/default/paq/provenance-access.html
>>>>>> [2] http://www.w3.org/2011/prov/track/issues/46
>>>>>>
>>>>> --
>>>>> Dr. Paul Groth (p.t.groth@vu.nl)
>>>>> http://www.few.vu.nl/~pgroth/
>>>>> Assistant Professor
>>>>> Knowledge Representation&  Reasoning Group Artificial Intelligence
>>>>> Section Department of Computer Science VU University Amsterdam
>>>>
>>>
>>> --
>>> Dr. Paul Groth (p.t.groth@vu.nl)
>>> http://www.few.vu.nl/~pgroth/
>>> Assistant Professor
>>> Knowledge Representation & Reasoning Group Artificial Intelligence
>> Section
>>> Department of Computer Science VU University Amsterdam
>>
>
>
> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email
> ______________________________________________________________________
>



--
Dr Simon Miles
Lecturer, Department of Informatics
Kings College London, WC2R 2LS, UK
+44 (0)20 7848 1166
Received on Sunday, 14 August 2011 15:40:04 UTC