Re: PROV-ISSUE-15 (define-views-or-account): Definition for Concept 'Views or accounts' [Provenance Terminology] from Luc Moreau on 2011-05-31 (public-prov-wg@w3.org from May 2011)

From: Luc Moreau <L.Moreau@ecs.soton.ac.uk>
Date: Tue, 31 May 2011 21:50:38 +0100
To: public-prov-wg@w3.org
Message-ID: <EMEW3|b92d23afbe13baad2b8440ae7c90261en4ULoj08L.Moreau|ecs.soton.ac.uk|4DE5549E>
Hi Simon,

URIs may well be the solution we will adopt in the end.

However, I am wary of a data model that mandates the creation of URIs 
for every
thing, and potentially creates an unnecessary burden on implementations 
to satisfy
strong properties (e.g. unicity, persistence, etc), for little benefits.

For instance, I imagine that once a provenance model is standardized, we 
could create
a library that offers graph operations, e.g. intersection/union/ etc.  
These operations
could usefully apply to accounts.  A given "provenance service" may well 
apply many
of these operations before returning the provenance of a thing.  Do we 
have to create
all those URIs for each "intermediary account"?

Is there actually a value in forcing a URI to be minted for an account C 
obtained by
creating the intersection of two accounts A and B?

Furthermore, if I compute again the intersection of two accounts A and 
B, and obtain
an account with the same statements as in C, do I need to call it C or 
should it be a new name, say D?
The former will be computationally inefficient. With the latter, I will 
have two accounts C and
D with same content, but with different names.

Note that:
- I have not mentioned access and query, it's just about naming
- I am not talking about a specific serialization, but an abstract model

So, my view is that it may be preferable to
- be able to distinguish accounts within a given provenance container
- optionally, assign them global names

Cheers,
Luc

On 31/05/11 17:24, Simon Miles wrote:
> Hi Luc,
>
> This discussion may be more about access and query than accounts, but
> I'll keep it on this thread for now.
>
> I'd be inclined to say that as we're proposing something for the web
> and for provenance, identifiers should be universal in space and time,
> respectively, unless there's a strong reason not to be. I don't know
> of such a reason for accounts.
>
> Surely an identifier local to a server becomes universal (in space)
> when combined with the server's identifier (just as web resources
> combined from paths from the document root and the server DNS). We do
> not need to pretend to a client that the universal identifier does not
> exist.
>
> And if we consider provenance data may be encoded in RDF, I'm unclear
> how you could have an identifier local to a provenance container
> (noting that the latter is not yet defined). URIs are universally
> scoped in space. Blank nodes have non-universal scope, but their
> actual scope depends on the set in which they are contained and is not
> controllable, e.g. in one triple store, a blank node is uniquely
> identified within the contents of that store, but if someone copies
> the triples into a larger store, the blank node is now unique within a
> larger set of assertions. As far as I can see, named graphs does not
> affect this issue either - no identifiers are local to named graphs.
>
> Thanks,
> Simon
>
> On 31 May 2011 16:45, Luc Moreau<L.Moreau@ecs.soton.ac.uk>  wrote:
>    
>> Hi Simon and Paul,
>>
>> What's the scope of these identifiers?
>>
>> Let's say I retrieve provenance of thing1, which includes an account
>> identified by account-x.y.z.
>> Then, I retrieve provenance of thing2, which includes an account
>> identified by account-x.y.z
>>
>> What is the scope of identifier account-x.y.z :
>> - universal in space and time?
>> - the server which returned the provenance?
>> - the provenance container in which the account was declared?
>> - something else
>>
>> Luc
>>
>> On 05/31/2011 04:23 PM, Paul Groth wrote:
>>      
>>> Hi Simon,
>>>
>>> I agree with you. I think the key thing is to realize that provenance
>>> is asserted by one or more entities (sources?) and thus is an account
>>> of the state of the world.
>>>
>>> I don't think we should be forced to identify these identies. However,
>>> each account should have an identifier.
>>>
>>> Paul
>>>
>>> Simon Miles wrote:
>>>        
>>>> Hi Paul,
>>>>
>>>> Thanks for the comments. Answers are interleaved.
>>>>
>>>>          
>>>>> I was wondering why an account must be from one source.
>>>>>            
>>>> Just because it seemed most intuitive, but maybe an account could have
>>>> multiple sources, as long as we are clear what that would mean.
>>>>
>>>> If we meant multiple actors may agree with an account and wouldn't
>>>> describe what occurred any differently from the same perspective, then
>>>> that's true but there would still be one actor which originally
>>>> provided the account.
>>>>
>>>> If we meant that multiple actors may be "co-authors" of an account,
>>>> that would be more reasonable. I guess I was considering such a group
>>>> as a single source, but I agree this may not be the clearest way to
>>>> define things. Of course, an account can have a its own provenance
>>>> where it can be specified in detail who contributed what and how.
>>>>
>>>>          
>>>>> I think a source maybe an annotation on an account.
>>>>>            
>>>> That's an issue separate from concept definition, surely. "Annotation"
>>>> applies to some data (a serialised account), and "annotating with a
>>>> source" requires having an identifier for the source, which my
>>>> definition of account does not require.
>>>>
>>>>          
>>>>> I think a more general definition would be.
>>>>> - An account is a record of something that has occurred from a
>>>>> particular perspective.
>>>>>            
>>>> I'm fine with that definition. It still feels like the definition
>>>> implies rather than makes explicit something significant, i.e. that
>>>> the account comes from one or a group of sources, but I don't have a
>>>> strong argument why it needs to be explicit.
>>>>
>>>>          
>>>>> I agree with the notion that every description of some occurrence must
>>>>> be part of an account but I don't think that needs to be identified.
>>>>>            
>>>> Again, I think this goes beyond the concept definition to design
>>>> decisions, but maybe we can't separate the two. It depends what you
>>>> mean by "identified" as to whether I agree with you :-).
>>>>
>>>> If you mean that there doesn't need to be any metadata about the
>>>> account(s) each occurrence is referred to in, such as the source of
>>>> the account, then I agree it may be too much to require.
>>>>
>>>> But if you mean that we may not be able to distinguish whether two
>>>> assertions about what has occurred are from the same source and
>>>> perspective or not (i.e. same accounts or not), then I'm not convinced
>>>> - it seems to go against the purpose of providing provenance to aid
>>>> trust and interpretation to lose such distinctions.
>>>>
>>>> Further, if you provide no identifier for an account, then don't you
>>>> lose (or make much harder) the possibility of providing metadata about
>>>> it in the future? So, I would argue that all occurrences, assertions,
>>>> or whatever parts comprise provenance information, should be part of
>>>> at least one account, and that those accounts should be given
>>>> identifiers, even if no other information about the account is
>>>> provided.
>>>>
>>>> Thanks,
>>>> Simon
>>>>
>>>>          
>>>>> thoughts?
>>>>> Paul
>>>>>
>>>>> Simon Miles wrote:
>>>>>            
>>>>>> Hello,
>>>>>>
>>>>>> My proposed starting definition:
>>>>>>     - An account is a record of something that has occurred provided by
>>>>>> one source and taking one perspective in describing what occurred.
>>>>>>
>>>>>> Notes:
>>>>>>     - I would expect the provenance of a resource (or whatever
>>>>>> provenance
>>>>>> is of) to comprise a set of accounts or parts of accounts, as all the
>>>>>> information within that provenance has to come from somewhere and take
>>>>>> some perspective.
>>>>>>     - The definition does not require that the source be identified -
>>>>>> whether we require it to be seems a design decision not part of
>>>>>> concept definition.
>>>>>>     - The same occurrence (e.g. a "resource" or "process execution")
>>>>>> could be referred to in multiple accounts. I would expect it to be
>>>>>> decision of the account sources whether they are referring to the same
>>>>>> thing in their assertions.
>>>>>>     - "Perspective" could be rephrased as something more concrete. An
>>>>>> example of perspective (from OPM) is the granularity of description:
>>>>>> whether what has occurred is described coarsely or in detail. However,
>>>>>> there may be other useful distinctions in perspective.
>>>>>>     - Every occurrence included in some provenance data would be
>>>>>> part of
>>>>>> at least one account (if it had not been documented, it could not be
>>>>>> included). This may be a distinction from OPM, where I believe
>>>>>> entities can be included in provenance without being in an account.
>>>>>>
>>>>>> Thanks,
>>>>>> Simon
>>>>>>
>>>>>> On 20 May 2011 08:38, Provenance Working Group Issue Tracker
>>>>>> <sysbot+tracker@w3.org>      wrote:
>>>>>>              
>>>>>>> PROV-ISSUE-15 (define-views-or-account): Definition for Concept
>>>>>>> 'Views or accounts'   [Provenance Terminology]
>>>>>>>
>>>>>>> http://www.w3.org/2011/prov/track/issues/15
>>>>>>>
>>>>>>> Raised by: Luc Moreau
>>>>>>> On product: Provenance Terminology
>>>>>>>
>>>>>>> The Provenance WG charter identifies the concept 'Views or
>>>>>>> accounts' as a core concept of the provenance interchange language
>>>>>>> to be standardized (see http://www.w3.org/2011/01/prov-wg-charter).
>>>>>>>
>>>>>>> What term do we adopt for the concept 'Views or accounts'?
>>>>>>> How do we define the concept 'Views or accounts'?
>>>>>>> Where does concept 'Views or accounts' appear in ProvenanceExample?
>>>>>>> Which provenance query requires the concept 'Views or accounts'?
>>>>>>>
>>>>>>> Wiki page:http://www.w3.org/2011/prov/wiki/ConceptViewsOrAccounts
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ______________________________________________________________________
>>>>>>>
>>>>>>> This email has been scanned by the MessageLabs Email Security System.
>>>>>>> For more information please visit http://www.messagelabs.com/email
>>>>>>> ______________________________________________________________________
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>>
>>>>>>              
>>>>> ______________________________________________________________________
>>>>> This email has been scanned by the MessageLabs Email Security System.
>>>>> For more information please visit http://www.messagelabs.com/email
>>>>> ______________________________________________________________________
>>>>>
>>>>>            
>>>>
>>>>
>>>>          
>>>        
>> --
>> Professor Luc Moreau
>> Electronics and Computer Science   tel:   +44 23 8059 4487
>> University of Southampton          fax:   +44 23 8059 2865
>> Southampton SO17 1BJ               email: l.moreau@ecs.soton.ac.uk
>> United Kingdom                     http://www.ecs.soton.ac.uk/~lavm
>>
>>
>>
>> ______________________________________________________________________
>> This email has been scanned by the MessageLabs Email Security System.
>> For more information please visit http://www.messagelabs.com/email
>> ______________________________________________________________________
>>
>>      
>
>
>
Received on Tuesday, 31 May 2011 20:51:13 UTC