views, complements and invariants (was: updates to PAQ doc for discussion) from Graham Klyne on 2011-08-25 (public-prov-wg@w3.org from August 2011)

From: Graham Klyne <Graham.Klyne@zoo.ox.ac.uk>
Date: Thu, 25 Aug 2011 11:56:13 +0100
To: "Myers, Jim" <MYERSJ4@rpi.edu>
CC: Khalid Belhajjame <Khalid.Belhajjame@cs.man.ac.uk>, Paul Groth <p.t.groth@vu.nl>, "public-prov-wg@w3.org" <public-prov-wg@w3.org>
Message-ID: <4E562A4D.2020404@zoo.ox.ac.uk>
On 20/08/2011 20:26, Myers, Jim wrote:
> Graham, I'd like to have something like you describe, but I don't think you can really make the inferences you want to for the general case, and don't necessarily see why version helps versus view. I'm not opposed to having a subtype of view for version, but I'm not sure how to make it rigorous...
>
> Taking those parts in order:
>
> 1) The problems I have with inferring authorship/editorship have to do with the fact that not all edits are equal. Someone who just fixes grammar might not be an author/editor in the final doc. Changes that are put in in one version and removed in another may or may not indicate that an intellectual contribution has been made (my text doesn't make the final version but other text is added in some other part of the doc because of what I contributed...am I an author/editor or not?). To me these types of issues really indicate that a document is not just a more flexible version of a file/file-like version, that edit operations aren't really occuring on the same type of thing as editorial/intellectual contributions are made on, etc. So we really have the IVPof/view case although we try to pretend that is really hierarchical and just a matter of more/less constrained versions of the same thing.

I think this comment is mostly to do with the specifics (i.e. weakness) of my 
example used to illustrate the desideratum.

It would be easier to talk about this simply in terms of time-varying resources. 
  Let's try the weather report example again:

(Weather in London at 12:00 on 1-Jan-2000)
    isViewOf (Weather in London on 1-Jan-2000)
(Weather in London on 1-Jan-2000)
    isViewOf (Weather in London)

I think it can be useful to infer from this that:

(Weather in London at 12:00 on 1-Jan-2000)
    isViewOf (Weather in London)

 From this, I would expect any provenance statements that are generally true for 
(Weather in London) are also true for (Weather in London at 12:00 on 
1-Jan-2000).  That is invariants are preserved forward across isViewOf relations 
(and others may be introduced)

> 2) Regarding the question of why version does something better that view - if X is a view of A and Y is another view of A, why wouldn't I think inferring creatorship/editorship is OK? (I'm claiming above that inferring is probably not valid in some cases - here I'm asking whether version does a better job of cutting down on those cases versus view.) I.e. if you wrote the bits to a section of a disk that is an anIVPof/view of a file, why wouldn't it be just as valid or invalid as trying to make that inference between a doc and a version of it? How does a hierarchical meaning help? (I guess I'm assuming that IVPof is one-way like version and for my disk versus file use case here I would actually assert IVPof in both directions so I could infer the file creator also wrote the bits to the disk and vice versa whereas with your doc/version case the IVPof relationship would go one way. So, rephrasing the question here - I'd agree that inference should only go in the direction of t

he relationship, but if there are relationships in both directions, wouldn't 
inferrencing be just as valid for that case?)

Maybe I'm misunderstanding the intent, but what I interpret from

   if X is a view of A and Y is another view of A

is that (X complementOf Y) and (X complementOf Y) in the previous terminology. 
 From this, I can't see how my knowledge of X alone allows me to in fer anything 
about Y.  (I already accept the relevance of this kind of relation for dealing 
with accounts.)

> 3) I would tie the use cases together and rather than looking to infer authorship/editorship from view or version relationships, I would see any differences in who's listed for the doc and the aggregate list from each version as an indication that there's been an error, a lie, or the provenance is just not complete (intellectual contributions haven't been separated from text/file-level edits, one version isn't really 'derivedfrom' another when I look at more granularity in the files or processes, etc.)
>
> A version relationship may still be a useful, particularly if we agree that it allows inferencing as you want (i.e. you only use version instead of view when you want people to infer authorship/editorship/(what else can I infer?) -view shouldn't work that way, version could though there would be cases where the English language meaning and this technical definition would be at odds (the examples I've given).
>
> If we do that, I think version would have tol only be valid within an account - i.e. the notion of version is an indication that, for the set of processes being reported, the asserter believes one can consider the view relationships hierarchical/transitive/version-like and inferrencing is OK. If I take two accounts that use version and merge them, I may find that the set of processes they describe will break versioning - versions might have to be interpreted as views because of the additional info (Perhaps this example works: if you use version to indicate text changes in a doc and I use version to describe multiple copies (file versions) of one logical file (one of your versions of a doc), I think both might be internally consistent, but together they'd imply that every person who copied a version of your doc was an author/editor which is not what you intended). Perhaps version being account-limited is still OK - PIL is an assertion language and so an asserter may be wrong

and it may be possible that they are wrong about a version relationship while 
still being right about their being a view relationship...

Hmmm... I'm not sure I go with this.  You seem to be saying that the truth of 
provenance assertions about a resource (as opposed to secondary properties 
concerning the available inferences) is contextualized by the account in which 
they occur.

Examine the account example in the OPM model document 
(http://eprints.ecs.soton.ac.uk/21449/1/opm.pdf, p6), I see two accounts 
(looking at figure 4):

   (3,7) is derived from (2,6) by application of "add1"

   3 is derived by 2 by application of "add1"
   7 is derived by 6 by application of "add1"

These statements come from two different accounts, but I see their truth is 
independent of the account to which they belong (or of which they are a part). 
What *does* depend on the account context being considered are assertions that 
can be made about the overall structure of the provenance graph (e.g. the 
absence of loops).

I find the idea that a provenance assertion may be valid (i.e. True) only for a 
given account to be surprising.  And if it's part of the resulting provenance 
model, I think developers will get it wrong.

#g
--
Received on Thursday, 25 August 2011 13:51:59 UTC