- From: Myers, Jim <MYERSJ4@rpi.edu>
- Date: Wed, 1 Jun 2011 12:38:38 -0400
- To: W3C provenance WG <public-prov-wg@w3.org>
All, I've been trying to keep up with the discussion on resources and would like to throw in a few comments related to the mutable/immutable, resource vs state/copy/version, FRBR issue(s). Probably too condensed to make sense as is, but hopefully useful none-the-less. I submitted a poster last year to the IPAW meeting arguing that all of the above notions are really relative to the process(es) one is considering and that it is usually possible to think in terms of other processes that would shift a given thing(resource) from one category to another. Some examples below, but the bottom line is that I think any attempt to categorize a given thing along the lines above in absolute terms is going to cause problems, particularly for use cases where we attempt to tie together provenance from multiple witnesses who may be interested/aware of different processes/levels of detail. Opinions will differ as to whether specific aspects of state result in new resources or are simply part of the lifecycle of a given resource and those opinions will depend on the processes one is most interested in. The alternative that I think could work better would be to think of one or more of these ontologies (FRBR, etc.) as labeling the roles of things/resources relative to the events/process types being reported or relative to accounts. (I think this is similar to the idea of trying to annotate a resource with aspects of state that are considered relevant to identity (it's the state being changed by the processes being discussed) but I think it is more natural to think in terms of roles.) Some examples: 1) Two queries against a database that produce the same result are technically the same thing but legally different (the first person doing it gets the patent/wins the discovery prize) - do we always want to require use of multiple resources if we can invent a use case where we want to distinguish the results? Or force the lawyer to use one resource annotated with two states? Further, If we say this is one resource with two resource states (as queried by A and B), what do we do about results that do differ technically - aren't they also two resource states of a broader resource (the answer as a function of time/database mods)? 2) A logical file/URI looks to the end user as though it corresponds directly to a specific byte stream (assuming we're talking about a read-only case) but to the system (think content delivery network) that logical file could refer to multiple physical file copies which are really what the user downloads. Do we need to care whether a URI is content-network-managed or not to record that I downloaded a URI? 3) To an author, the work is the text, to the publisher the work may be a particular typesetting of it (expressed as hardcover, softcover, and electronic versions manifested in book copies), to a computer, the resulting PDF file for the e-version might be a work, expressed at different resolutions and manifested in various byte-level copies everywhere. FRBR originated for the first purpose (authors/intellectual works) - the point is that same 4 levels won't suffice to cover what the publisher and OS are reporting. Manifestations that the author would like to consider as immutable manifestations are resources that other witnesses know are mutable/versionable. How do we deal with the more than 4 levels of abstraction going from author to publisher to computational workflows? How would we get an absolute anchor? Etc. You may or may not like these examples, but I think many of the discussions of use cases in the prov-wg focus on one witness talking about resources relative to a few processes and the things that look like paradoxes/challenges to us arise when we start to broaden the # of witnesses or process types (computational and publishing and intellectual processes). Further a lot of the discussions we collectively have had in the past about opm:agents (are they just a type of resource?) and pml:sources (are they resources?) hit this too - I can think of processes (e.g. birth/death) where agents have provenance and are created and destroyed like a resource but we've really used agent to represent something that, relative to the process it participates in, may change aspects of its state but does not change its identity. Source is the same type of 'thunk' - a source could be talked about as a resource (incorporated, bankrupted, ...) but its role in publishing is as a mutable entity. In both cases, I think the relevant point is that we've created a type to capture a role relative to a process and the better answer might be to define the role relationships directly. My sense is that this is going to be a cleaner way to go that will also help avoid issues as provenance becomes ubiquitous and the number of witnesses reporting it and the number of perspectives grows. Cheers, Jim -----Original Message----- From: public-prov-wg-request@w3.org [mailto:public-prov-wg-request@w3.org] On Behalf Of Graham Klyne Sent: Wednesday, June 01, 2011 6:02 AM To: Paul Groth Cc: Luc Moreau; W3C provenance WG Subject: Re: Resources and state Paul, That's a fair summary, particularly about advocating the second option you list, and I agree about the need for debate. If someone can show me a clear example where the second option doesn't work, I'll pipe down :) I can't arguing with capturing the notions (though to some extent I thought that was coming from the XG), but I don't want it to be assumed that *all* of these notions have to be represented explicitly in a core model. I guess what I am most concerned about is that we don't "sleep walk" into using a more complex model without at least considering why we need it. #g -- Paul Groth wrote: > Hi Graham, > > I'm absolutely with you here. > > I think we want to be able to describing the provenance of resources > as simply and developer friendly as possible > > One should be able to simply point to a resource and put some > provenance properties on it. i.e. dc:source should fit fine into the model. > > On the other hand, it's clear that many models of provenance take > advantage of the notion of state to allow for clear provenance > semantics and the possibility of inference. > > The question is, whether > 1) we build up from talking about resource state and have conventions > for dealing with resources that are dynamic; or > 2) the other way around. We take resources and then add additional > properties to discuss things like states. > > I think you're advocating for 2). > > But in the end the question is really we should capture all notions. > Would you agree that resource and resource state are resource > representation are all important notions we need to be able to discuss. > > Is that correct? > > Thanks, > Paul > > > > > Graham Klyne wrote: >> Luc Moreau wrote: >>> Hi Graham, >>> >>> Isn't it that you used the duri scheme to name the two resource >>> states that exist in this scenario? >> >> That is a possible way of describing it, but the essence of what I >> suggest is that the "states" (or "snapshots") are themselves >> resources. >> >>> In your view of the web, is there a notion of stateful resource? >>> Does it apply here? >> >> Resource state is an architectural concept in the web. Indeed, it's >> so fundamental that the notion is used throughout >> http://www.w3.org/TR/webarch/ without actually being defined, as far >> as I can tell. (Other than indirectly by reference to Fielding's >> thesis.) >> >> But not all resources have dynamic state. Indeed, probably most >> resources on the web are static. >> >> What I'm trying to do is avoid a layer of modelling complexity that I >> don't believe is needed: I think we can say all we need to say by >> just talking about resources. And in some cases, I think that talking >> about non-resources can lead to inconsistencies or awkwardness >> (sorry, no example to hand.) >> >> To some extent, this pushes the static/dynamic discussion to a >> different place, because in some cases the type of a resource may be >> significant. But I see that as a useful extension point in any case, >> when discussing possibly conflicting models of provenance and >> associated inference. What I'd really like to do is simplify the >> *core* model until there's nothing there for the different provenance >> models to disagree about. >> >> ... >> >> Tangentially related, I just read through Yolanda's slides >> (http://www.w3.org/2005/Incubator/prov/wiki/images/0/02/Provenance-XG >> -Overview.pdf), >> >> and followed up on some of the associated reports, and my feeling is >> that there's serious potential here for scope creep. >> >> One point in which I am in very strong agreement, indeed, I think >> it's probably the most important thing for us as a working group, is: >> >> Slide 36: >> "The exchange language should have a low entry point to facilitate >> widespread adoption, therefore it should be easy to do simple things" >> >> I think this is crucial the the eventual success of this group's work >> (by which I don't mean successfully advancing to REC status, but >> creating specifications that will actually be used in the web at large). >> The provenance problem is too important to mess it up my making it >> too complicated for developers to get started. >> >> So, I think that at a basic level of provenance on the web, we want >> to avoid talking about states, and snapshots, and other constructs >> that are not directly relevant to a web developer creating an >> application to record or use provenance information. I think the >> notion of "resource", interprted broadly per AWWW, etc., allows us to >> do that. Building upon a very simple core model, the subtleties can >> be added throug refinement of the core concepts. But if the core >> concepts are not so simple that developers can easily generate >> provenance information, this group's work could end up like a >> magnificently engineered car with no roads to drive on. >> >> #g >> -- >> >>> On 31/05/11 23:57, Graham Klyne wrote: >>>> Luc Moreau wrote: >>>>> Graham, >>>>> >>>>> In my example, I really mean for the two versions of the chart to >>>>> be available at the same URI. (So, definitely, an uncool URI!) >>>>> >>>>> In that case, there is a *single* resource, but it is stateful. >>>>> Hence, there >>>>> are two *resource states*, one generated using (stats2), and the >>>>> other using (stats3). >>>> >>>> Luc, >>>> >>>> I had interpreted your scenario as using a common URI as you explain. >>>> >>>> But there are still several resources here, but they are not all >>>> exposed on the web or assigned URIs. I'm appealing here to anything >>>> that *might* be identified as opposed to things that actually are >>>> assigned URIs. (For example, the proposed duri: scheme might be >>>> used >>>> - http://tools.ietf.org/id/draft-masinter-dated-uri-07.html) >>>> >>>> (And the URI is perfectly "cool" if it is specifically intended to >>>> denote a dynamic resource. A URI used to access the current weather >>>> in London can be stable if properly managed.) >>>> >>>> (I think this is all entirely consistent with my earlier stated >>>> positions.) >>>> >>>> #g >>>> -- >>>> >>>>> Of course, if blogger had used cool uris, then, c2s2 and c2s3 >>>>> would be different resources. >>>>> >>>>> Luc >>>>> >>>>> On 05/31/2011 02:25 PM, Graham Klyne wrote: >>>>>> I see (at least) two resources associated with (c2): one >>>>>> generated using (stats2), and other using (stats3). We might call >>>>>> these >>>>>> (c2s2) and (c2s3). >>>>> >>> >> >> >> >
Received on Wednesday, 1 June 2011 16:39:07 UTC