Re: Towards PROV-O Accounts from Luc Moreau on 2012-01-16 (public-prov-wg@w3.org from January 2012)

From: Luc Moreau <L.Moreau@ecs.soton.ac.uk>
Date: Mon, 16 Jan 2012 14:57:30 +0000
To: Timothy Lebo <lebot@rpi.edu>
CC: public-prov-wg@w3.org
Message-ID: <EMEW3|d29935998bb33ef0d23a2c742fc9334ao0FEvZ08L.Moreau|ecs.soton.ac.uk|4F143ADA>
Hi Tim,
Responses interleaved.

On 01/15/2012 06:25 PM, Timothy Lebo wrote:
> Hi, Luc.
>
> Thanks for so much feedback and sorry for taking so long to respond.
>
> responses within.
>
> On Jan 5, 2012, at 8:55 AM, Luc Moreau wrote:
>
>    
>> Hi Tim,
>>
>> Thanks for your document on using graphs to model accounts.
>>
>> There is a few issues that I think we should discuss since they
>> potentially represent substantial differences with prov-dm.
>>
>> 1. I have a problem with your statement:
>>
>>    An Account is an Entity that was generated by an asserter during an
>>    assertion activity.
>>
>> This is not what prov-dm states at all.
>>
>> Indeed, an account is a "thing in the world". We can then take
>> multiple perspectives about that thing, which we can represent as
>> entities for provenance purpose.  Having done that, then we can talk about
>> the provenance of an account.
>>
>> There maybe multiple ways of looking at a given account acc:
>>   - what account acc tells us about entity e
>>   - acc with suitable anonymization
>>   - acc with a cryptographic signature
>>   - etc
>>
>>   It is up to the provenance asserter to decide how to assert entities
>>   and we can't do that for them.
>>   That's why it is not right to say that an account *is* an entity.
>>
>>   However, an account is a thing in the world and we can define perspectives
>>   on it as entities.
>>
>>      
> I appreciate your attempt to clarify this point, but I must admit I don't know how to make the distinction useful.
> I see "Things" vs. "Entities" as semiotic referents and symbols, respectively.
> I'm going to hope that I can recruit those in the group that have a better understanding than I do and seem to share my confusion.
>
>
>    
>>
>> 2. Currently an asserter is not modelled as an agent.  There is a note
>>    to that effect. Nobody has come back to this point.  Until we firm
>>    up this issue, we won't be able to decide whether your modelling is
>>    correct or not.
>>      
> I find it difficult to conceive of an asserter that does not have any responsibility for what it stated.
> And since responsibility is agency in DM, I would think the asserter must be an agent.
>
>    
>> 3. I agree with you that if we have an entity for an account, we can
>>    also explain how it is generated, etc.
>>
>>    Maybe, it's easier to use attributionm rather than introduce activity types.
>>
>>     wasAttributedTo ( eIdentifier , agIdentifier optional-attribute-values )
>>      
> I very much like this idea. Thank you for the suggestion.
> As we discussed briefly a few telecons ago, would the DM be able to have qualified wasAttributedTo relations?
> I think that it would be a natural question for a consumer, upon hearing that "account x was from agent y", to want to ask about how, when, or in what situation agent y stated those things (e.g., under oath in a courtroom, on twitter 2am on a Friday night, etc).
> I added https://www.w3.org/2011/prov/track/issues/216 so we can track this idea.
>
>    
>> 4. I can't decide whether your graph hash is central to your encoding
>>    or not.
>>      
> The fact that I'm using a hash to name is not important, but _what_ I am naming _is_ important.
> As long as one knows they are naming a set of abstract triples, they can choose any non-hash name they desire.
> But they need to know that they are denoting the abstract triples.
> I am using the hash to be clear about what I'm naming.
> Adding or changing a triple would result in a new abstract graph and would thus need a new name.
> Hashes exhibit this same characteristic, thus their employment.
>
>    

Then, it looks like the hashing was a distraction for a reader like me.

>> If it is part of the design, it doesn't match my view of accounts.
>>
>>    Using prov-aq, I may retrieve the provenance of entity e1, and obtain:
>>
>>    acc(ex:a1,
>>        http://ex/asserter1,
>>        entity(e1,[...])
>>        ...)
>>
>>    Again, using prov-aq, I may retrieve the provenance of entity e2, and obtain:
>>
>>
>>    acc(ex:a1,
>>        http://ex/asserter1,
>>        entity(e2,[...])
>>        ...)
>>
>>    Same account in the sense that it is generated by
>>    http://ex/asserter1 and named ex:a1, but different subset of
>>    records.
>>      
> That is fine, because the consumer is not naming the account.
> The producer already has named it, and knew the full graph when they named it.
> (again, hashes need not be used, but the properties that they exhibit should be followed)
>
>    

Conceptually, a producer may know the full graph when they name it.
In practice, it may not be the case. The account is named, and then records
as they are generated, are added to the account.


>>    It's important to support that use case, since in that case, those
>>    two account instances are telling us that they are the same, coming
>>    from a same asserter, and can be merged without conflict (if the
>>    original full account was without conflict).
>>      
>
> This use case is not only supported, but also motivates modeling accounts as abstract graphs.
> If instead we used the graph name when returning portions of an account, we wouldn't know which graphs we should merge.
> I've added an example at http://www.w3.org/2011/prov/wiki/Using_graphs_to_model_Accounts#Piece-wise_accounts
>
>    
>>    I don't know how these hashes work, given that these account examples
>>    contain different records.
>>      
> The account had already been named by the time the consumer got two two portions of it.
> The URI of the account wouldn't change after the fact.
>
>
>    
>> 5. Your nested account example:
>>
>>   In acc4_claims, you write:
>>    :e1
>>       prov:wasComplementOf :e1;
>>
>>   Shouldn't it be e0?
>>    :e1
>>       prov:wasComplementOf :e0;
>>      
> Yes, I mis-transcribed. I changed it.
>
>
>    
OK
>>     How do you find which entity record this actually is?
>>      
>
> :e1 or :e0?
>    
> I could resolve their URIs to get some descriptions.
> I could query the current Dataset (0 or 1 default graphs and 0+ named graphs) which could be a trig file or a triple store, among other things.
> In RDF, if :e1 or :e0 is every mentioned or described, your "record" grew.
>
> I must admit, I do not understand your continuing need to "find a record".
> I'm starting to think that your "records" are bounded by an RDF Graph.
> The semantic web is designed to let the subgraph around an entity transcend the boundaries of particular files, graphs, stores, etc.
>
>    
I suggest you look at my recent post
http://www.w3.org/2011/prov/wiki/ProvenanceOfW3CReport

The example here does not exhibit all the complexity we have to consider.
But you can see that the activity a0 is described differently in acc3 
and acc4.


If we had not different activity records, but different entity records 
in these account,
then just referring to the entity id is not enough (like in this example 
a0 could mean a copyFile or a createFile activity).


>    
>>    To be compliant with prov-dm, you should probably encode the example as
>>
>>    :e1, ex:acc4
>>       prov:wasComplementOf :e0 , ex:acc3;
>>      
> This isn't valid Turtle or Trig.
>
>    
yes, I know, but the point is that you may want to be precise about
why "entity record" you refer to.
You may have lots of accounts talking in very different ways about e0.
What do you assert with :e1 wasComplementOf e0?



>>    meaning was is asserted about e1 in account4 wasComplementOf what is asserted
>>    about e0 in account3.
>>      
> That is already stated ( I abbreviate here). Remember the :e0 is an abbreviation for a full URI, so both occurrences of :e0 are the same URI (and thus referring to the same resource or "Thing"). Though, if anyone else asserts something about :e1 in their account, we wouldn't know if ex:acc3_claims was complementing that, too.
>
> ex:acc4_claims {
>     :e1 prov:wasComplementOf :e0;
> }
> ex:acc3_claims {
>     :e0 ?p ?o
> }
>
>
>    

I don't understand this.
>    
>> 6. In your example, is it problematic or not, to have two different
>>    entity records containing the same entity identifier?
>>      
> You named them the same in ASN, so I named them the same in URI. If they should not be the same, then I can change one of the URIs.
>    

They have the same URI, but can't we say different things *about* them 
in different accounts.
In particular, they may have different attributes in different accounts.


> The serialization doesn't matter here. It's RDF triples.
> Just as discussed in our last telecon, "record" doesn't mean anything in RDF.
> I think you agreed that if anything, an RDF "record" was a subgraph. The union of these 5 triples below is a graph. One subject with five "predicate-object" pairs.
> If the two accounts are not describing the same activity (or are offering different characterized perspectives), then one should be renamed and we could go as far as asserting that they are distinct.
>    
Yes, the two account are describing the same activity, but from 
different view points. Different attributes, potentially different 
provenance.

> The URI that :a0 is abbreviating is awww:identifying an Activity, a "Thing in the world", a awww:Resource. The URI is a symbol that denotes the Activity, and the Activity is being described (characterized) with attributes (triples). Resources can be awww:represented with a variety of concrete serializations; any serialization we receive when requesting a URI awww:represents the awww:Resource that the URI awww:identifies (and, "denotes").
>
> see http://www.w3.org/TR/webarch/
>
> The URI that ":a0" is abbreviating is NOT denoting the 3 (or 5) triples that use it as its subject. It is denoting the Activity.
>    
Agreed. But we say that a0 is createFile in acc3 and copyFile in acc4!
> There is no notion of "shape" or "container" (e.g., struct, array, row) that you keep wanting to impose on these triples.
>    

I don't impose a container, I  read different assertions about a0:

:a0
       dcterms:description "activity(a0,t,,[prov:type='createFile'])";
       a prov:Activity;
       a :CreateFile;
:a0
       dcterms:description "activity(a0,t,,[prov:type='copyFile'])";
       a prov:Activity;
       a :CopyFile;


> One _could_ impose a shape by circumscribing a subgraph by way of a named graph -- AND defining some graph traversal operations seeded at some set of resources. But this is not common.
>
>
>    
Luc
>>
>>      :a0
>>       dcterms:description "activity(a0,t,,[prov:type='createFile'])";
>>       a prov:Activity;
>>       a :CreateFile;
>>
>>
>>      :a0
>>       dcterms:description "activity(a0,t,,[prov:type='copyFile'])";
>>       a prov:Activity;
>>
>>
>>     If no, than that's great.
>>      
> It's not a problem if the URIs abbreviated by :a0 are awww:identifying (and "denoting") the same Activity - the "Thing in the world".
>
>    
>> What is your concern with scope then?
>>      
> There is no scope for naming/denoting in RDF. The user's choice of URI establishes scope or ignores scope; that's up to them. URI naming is orthogonal to any use within Accounts (and thus, Graphs). If the same "Thing in the world" is mentioned by way of URI in two different accounts, then the two accounts are describing the same Thing in the world.
>
>
>
>    
>>     If yes, than what is the exact problem?
>>      
>
> If you want two "string match" URIs mentioned in two different accounts to be denoting DIFFERENT "things in the world", I have concerns. It breaks the fundamental principles of the web.
>
>
> -Tim
>
>    
>>
>> Luc
>>
>>
>>
>>
>> On 01/05/2012 03:35 AM, Timothy Lebo wrote:
>>      
>>> prov-wg,
>>>
>>> I have been working on some discussion [1] that is relevant to modeling Accounts in PROV-O.
>>>
>>> It is incomplete, but I think ready for some initial feedback.
>>>
>>> Modeling accounts is on the agenda for tomorrow's telecon [2], so I hope this can provide some discussion material.
>>>
>>> Regards,
>>> Tim
>>>
>>> [1] http://www.w3.org/2011/prov/wiki/Using_graphs_to_model_Accounts
>>> [2] http://www.w3.org/2011/prov/wiki/Meetings:Telecon2012.01.05
>>>
>>>
>>>
>>>        
>> -- 
>> Professor Luc Moreau
>> Electronics and Computer Science   tel:   +44 23 8059 4487
>> University of Southampton          fax:   +44 23 8059 2865
>> Southampton SO17 1BJ               email: l.moreau@ecs.soton.ac.uk
>> United Kingdom                     http://www.ecs.soton.ac.uk/~lavm
>>
>>
>>
>>      
>    

-- 
Professor Luc Moreau
Electronics and Computer Science   tel:   +44 23 8059 4487
University of Southampton          fax:   +44 23 8059 2865
Southampton SO17 1BJ               email: l.moreau@ecs.soton.ac.uk
United Kingdom                     http://www.ecs.soton.ac.uk/~lavm
Received on Monday, 16 January 2012 14:58:03 UTC