RE: Resources and state from Myers, Jim on 2011-06-01 (public-prov-wg@w3.org from June 2011)

From: Myers, Jim <MYERSJ4@rpi.edu>
Date: Wed, 1 Jun 2011 12:38:38 -0400
To: W3C provenance WG <public-prov-wg@w3.org>
Message-ID: <B7376F3FB29F7E42A510EB5026D99EF2051E433D@troy-be-ex2.win.rpi.edu>
All,

I've been trying to keep up with the discussion on resources and would
like to throw in a few comments related to the mutable/immutable,
resource vs state/copy/version, FRBR issue(s). Probably too condensed to
make sense as is, but hopefully useful none-the-less.

I submitted a poster last year to the IPAW meeting arguing that all of
the above notions are really relative to the process(es) one is
considering and that it is usually possible to think in terms of other
processes that would shift a given thing(resource) from one category to
another. Some examples below, but the bottom line is that I think any
attempt to categorize a given thing along the lines above in absolute
terms is going to cause problems, particularly for use cases where we
attempt to tie together provenance from multiple witnesses who may be
interested/aware of different processes/levels of detail. Opinions will
differ as to whether specific aspects of state result in new resources
or are simply part of the lifecycle of a given resource and those
opinions will depend on the processes one is most interested in.

The alternative that I think could work better would be to think of one
or more of these ontologies (FRBR, etc.) as labeling the roles of
things/resources relative to the events/process types being reported or
relative to accounts. (I think this is similar to the idea of trying to
annotate a resource with aspects of state that are considered relevant
to identity (it's the state being changed by the processes being
discussed) but I think it is more natural to think in terms of roles.)
Some examples:

1) Two queries against a database that produce the same result are
technically the same thing but legally different (the first person doing
it gets the patent/wins the discovery prize) - do we always want to
require use of multiple resources if we can invent a use case where we
want to distinguish the results? Or force the lawyer to use one resource
annotated with two states?  Further, If we say this is one resource with
two resource states (as queried by A and B), what do we do about results
that do differ technically - aren't they also two resource states of a
broader resource (the answer as a function of time/database mods)?

2) A logical file/URI looks to the end user as though it corresponds
directly to a specific byte stream (assuming we're talking about a
read-only case) but to the system (think content delivery network) that
logical file could refer to multiple physical file copies which are
really what the user downloads. Do we need to care whether a URI is
content-network-managed or not to record that I downloaded a URI?

3) To an author, the work is the text, to the publisher the work may be
a particular typesetting of it (expressed as hardcover, softcover, and
electronic versions manifested in book copies), to a computer, the
resulting PDF file for the e-version might be a work, expressed at
different resolutions and manifested in various byte-level copies
everywhere. FRBR originated for the first purpose (authors/intellectual
works) - the point is that same 4 levels won't suffice to cover what the
publisher and OS are reporting. Manifestations that the author would
like to consider as immutable manifestations are resources that other
witnesses know are mutable/versionable. How do we deal with the more
than 4 levels of abstraction going from author to publisher to
computational workflows? How would we get an absolute anchor?

Etc.

You may or may not like these examples, but I think many of the
discussions of use cases in the prov-wg focus on one witness talking
about resources relative to a few processes and the things that look
like paradoxes/challenges to us arise when we start to broaden the # of
witnesses or process types (computational and publishing and
intellectual processes). Further a lot of the discussions we
collectively have had in the past about opm:agents (are they just a type
of resource?) and pml:sources (are they resources?) hit this too - I can
think of processes (e.g. birth/death) where agents have provenance and
are created and destroyed like a resource but we've really used agent to
represent something that, relative to the process it participates in,
may change aspects of its state but does not change its identity. Source
is the same type of 'thunk' - a source could be talked about as a
resource (incorporated, bankrupted, ...) but its role in publishing is
as a mutable entity. In both cases, I think the relevant point is that
we've created a type to capture a role relative to a process and the
better answer might be to define the role relationships directly. My
sense is that this is going to be a cleaner way to go that will also
help avoid issues as provenance becomes ubiquitous and the number of
witnesses reporting it and the number of perspectives grows.

Cheers,
 Jim

-----Original Message-----
From: public-prov-wg-request@w3.org
[mailto:public-prov-wg-request@w3.org] On Behalf Of Graham Klyne
Sent: Wednesday, June 01, 2011 6:02 AM
To: Paul Groth
Cc: Luc Moreau; W3C provenance WG
Subject: Re: Resources and state

Paul,

That's a fair summary, particularly about advocating the second option
you list, and I agree about the need for debate.  If someone can show me
a clear example where the second option doesn't work, I'll pipe down :)

I can't arguing with capturing the notions (though to some extent I
thought that was coming from the XG), but I don't want it to be assumed
that *all* of these notions have to be represented explicitly in a core
model.  I guess what I am most concerned about is that we don't "sleep
walk" into using a more complex model without at least considering why
we need it.

#g
--


Paul Groth wrote:
> Hi Graham,
> 
> I'm absolutely with you here.
> 
> I think we want to be able to describing the provenance of resources 
> as simply and developer friendly as possible
> 
> One should be able to simply point to a resource and put some 
> provenance properties on it. i.e. dc:source should fit fine into the
model.
> 
> On the other hand, it's clear that many models of provenance take 
> advantage of the notion of state to allow for clear provenance 
> semantics and the possibility of inference.
> 
> The question is, whether
> 1) we  build up from talking about resource state and have conventions

> for dealing with resources that are dynamic; or
> 2) the other way around. We take resources and then add additional 
> properties to discuss things like states.
> 
> I think you're advocating for 2).
> 
> But in the end the question is really we should capture all notions. 
> Would you agree that resource and resource state are resource 
> representation are all important notions we need to be able to
discuss.
> 
> Is that correct?
> 
> Thanks,
> Paul
> 
> 
> 
> 
> Graham Klyne wrote:
>> Luc Moreau wrote:
>>> Hi Graham,
>>>
>>> Isn't it that you used the duri scheme to name the two resource 
>>> states that exist in this scenario?
>>
>> That is a possible way of describing it, but the essence of what I 
>> suggest is that the "states" (or "snapshots") are themselves 
>> resources.
>>
>>> In your view of the web, is there a notion of stateful resource? 
>>> Does it apply here?
>>
>> Resource state is an architectural concept in the web. Indeed, it's 
>> so fundamental that the notion is used throughout 
>> http://www.w3.org/TR/webarch/ without actually being defined, as far 
>> as I can tell. (Other than indirectly by reference to Fielding's 
>> thesis.)
>>
>> But not all resources have dynamic state. Indeed, probably most 
>> resources on the web are static.
>>
>> What I'm trying to do is avoid a layer of modelling complexity that I

>> don't believe is needed: I think we can say all we need to say by 
>> just talking about resources. And in some cases, I think that talking

>> about non-resources can lead to inconsistencies or awkwardness 
>> (sorry, no example to hand.)
>>
>> To some extent, this pushes the static/dynamic discussion to a 
>> different place, because in some cases the type of a resource may be 
>> significant. But I see that as a useful extension point in any case, 
>> when discussing possibly conflicting models of provenance and 
>> associated inference. What I'd really like to do is simplify the 
>> *core* model until there's nothing there for the different provenance

>> models to disagree about.
>>
>> ...
>>
>> Tangentially related, I just read through Yolanda's slides 
>> (http://www.w3.org/2005/Incubator/prov/wiki/images/0/02/Provenance-XG
>> -Overview.pdf),
>>
>> and followed up on some of the associated reports, and my feeling is 
>> that there's serious potential here for scope creep.
>>
>> One point in which I am in very strong agreement, indeed, I think 
>> it's probably the most important thing for us as a working group, is:
>>
>> Slide 36:
>> "The exchange language should have a low entry point to facilitate 
>> widespread adoption, therefore it should be easy to do simple things"
>>
>> I think this is crucial the the eventual success of this group's work

>> (by which I don't mean successfully advancing to REC status, but 
>> creating specifications that will actually be used in the web at
large).
>> The provenance problem is too important to mess it up my making it 
>> too complicated for developers to get started.
>>
>> So, I think that at a basic level of provenance on the web, we want 
>> to avoid talking about states, and snapshots, and other constructs 
>> that are not directly relevant to a web developer creating an 
>> application to record or use provenance information. I think the 
>> notion of "resource", interprted broadly per AWWW, etc., allows us to

>> do that. Building upon a very simple core model, the subtleties can 
>> be added throug refinement of the core concepts. But if the core 
>> concepts are not so simple that developers can easily generate 
>> provenance information, this group's work could end up like a 
>> magnificently engineered car with no roads to drive on.
>>
>> #g
>> --
>>
>>> On 31/05/11 23:57, Graham Klyne wrote:
>>>> Luc Moreau wrote:
>>>>> Graham,
>>>>>
>>>>> In my example, I really mean for the two versions of the chart to 
>>>>> be available at the same URI. (So, definitely, an uncool URI!)
>>>>>
>>>>> In that case, there is a *single* resource, but it is stateful.
>>>>> Hence, there
>>>>> are two *resource states*, one generated using (stats2), and the 
>>>>> other using (stats3).
>>>>
>>>> Luc,
>>>>
>>>> I had interpreted your scenario as using a common URI as you
explain.
>>>>
>>>> But there are still several resources here, but they are not all 
>>>> exposed on the web or assigned URIs. I'm appealing here to anything

>>>> that *might* be identified as opposed to things that actually are 
>>>> assigned URIs. (For example, the proposed duri: scheme might be 
>>>> used
>>>> - http://tools.ietf.org/id/draft-masinter-dated-uri-07.html)
>>>>
>>>> (And the URI is perfectly "cool" if it is specifically intended to 
>>>> denote a dynamic resource. A URI used to access the current weather

>>>> in London can be stable if properly managed.)
>>>>
>>>> (I think this is all entirely consistent with my earlier stated
>>>> positions.)
>>>>
>>>> #g
>>>> --
>>>>
>>>>> Of course, if blogger had used cool uris, then, c2s2 and c2s3 
>>>>> would be different resources.
>>>>>
>>>>> Luc
>>>>>
>>>>> On 05/31/2011 02:25 PM, Graham Klyne wrote:
>>>>>> I see (at least) two resources associated with (c2): one 
>>>>>> generated using (stats2), and other using (stats3). We might call

>>>>>> these
>>>>>> (c2s2) and (c2s3).
>>>>>
>>>
>>
>>
>>
>
Received on Wednesday, 1 June 2011 16:39:07 UTC