Re: PROV-ISSUE-15 (define-views-or-account): Definition for Concept 'Views or accounts' [Provenance Terminology] from Simon Miles on 2011-06-02 (public-prov-wg@w3.org from June 2011)

From: Simon Miles <simon.miles@kcl.ac.uk>
Date: Thu, 2 Jun 2011 18:25:31 +0100
To: Provenance Working Group WG <public-prov-wg@w3.org>
Message-ID: <BANLkTi=QFOLhqOMoyiAt3SvAnAxKCiOoeg@mail.gmail.com>
Hi Luc,

Answering your points in no particular order...

The union of two accounts would not be an account under the definition
proposed, as it would not be from a single perspective. Therefore,
whatever rules applied to identifying accounts may not apply to the
results of the graph operations you describe.

I agree entirely that there is no value (maybe no possibility) in
giving an identifier to every arbitrary subset of provenance or any
other data.

I'm not sure the use cases motivating obtaining unions or
intersections of accounts, so I'm not clear how strong a case that is
for not using URIs.

The question of whether two accounts with the same content have one or
two identifiers is, I assume, separate from the scope of the
identifiers. The implication of the proposed definition is that they
should have separate identifiers if they are from different
perspectives, which I would take to include being from different
sources.

I'm wary about what it would mean to have only provenance
container-specific identifiers by default. As an example, if A
provides an account and B wishes to assert that they believe it to be
inaccurate, would this only be possible if the owner of the provenance
container (possibly A) had chosen to give the account a global
identifier?

Regarding the cost of assigning URIs, I first note that if a
provenance container has a global identifier, then the combination of
that and an identifier local to that container would be globally
unique, so guaranteeing uniqueness seems no more problem than for the
container. I can't see why persistence would be any harder than for
the container either - why would an account identifier need to be
changed? As an account has a well defined data value (its contents),
it also seems clear what it would dereference to, if we required that
property to hold of our URIs.

Thanks,
Simon

On 31 May 2011 21:53, Luc Moreau <L.Moreau@ecs.soton.ac.uk> wrote:
> Hi Simon,
>
> URIs may well be the solution we will adopt in the end.
>
> However, I am wary of a data model that mandates the creation of URIs
> for every
> thing, and potentially creates an unnecessary burden on implementations
> to satisfy
> strong properties (e.g. unicity, persistence, etc), for little benefits.
>
> For instance, I imagine that once a provenance model is standardized, we
> could create
> a library that offers graph operations, e.g. intersection/union/ etc.
> These operations
> could usefully apply to accounts.  A given "provenance service" may well
> apply many
> of these operations before returning the provenance of a thing.  Do we
> have to create
> all those URIs for each "intermediary account"?
>
> Is there actually a value in forcing a URI to be minted for an account C
> obtained by
> creating the intersection of two accounts A and B?
>
> Furthermore, if I compute again the intersection of two accounts A and
> B, and obtain
> an account with the same statements as in C, do I need to call it C or
> should it be a new name, say D?
> The former will be computationally inefficient. With the latter, I will
> have two accounts C and
> D with same content, but with different names.
>
> Note that:
> - I have not mentioned access and query, it's just about naming
> - I am not talking about a specific serialization, but an abstract model
>
> So, my view is that it may be preferable to
> - be able to distinguish accounts within a given provenance container
> - optionally, assign them global names
>
> Cheers,
> Luc
>
> On 31/05/11 17:24, Simon Miles wrote:
>> Hi Luc,
>>
>> This discussion may be more about access and query than accounts, but
>> I'll keep it on this thread for now.
>>
>> I'd be inclined to say that as we're proposing something for the web
>> and for provenance, identifiers should be universal in space and time,
>> respectively, unless there's a strong reason not to be. I don't know
>> of such a reason for accounts.
>>
>> Surely an identifier local to a server becomes universal (in space)
>> when combined with the server's identifier (just as web resources
>> combined from paths from the document root and the server DNS). We do
>> not need to pretend to a client that the universal identifier does not
>> exist.
>>
>> And if we consider provenance data may be encoded in RDF, I'm unclear
>> how you could have an identifier local to a provenance container
>> (noting that the latter is not yet defined). URIs are universally
>> scoped in space. Blank nodes have non-universal scope, but their
>> actual scope depends on the set in which they are contained and is not
>> controllable, e.g. in one triple store, a blank node is uniquely
>> identified within the contents of that store, but if someone copies
>> the triples into a larger store, the blank node is now unique within a
>> larger set of assertions. As far as I can see, named graphs does not
>> affect this issue either - no identifiers are local to named graphs.
>>
>> Thanks,
>> Simon
>>
>> On 31 May 2011 16:45, Luc Moreau<L.Moreau@ecs.soton.ac.uk>  wrote:
>>
>>> Hi Simon and Paul,
>>>
>>> What's the scope of these identifiers?
>>>
>>> Let's say I retrieve provenance of thing1, which includes an account
>>> identified by account-x.y.z.
>>> Then, I retrieve provenance of thing2, which includes an account
>>> identified by account-x.y.z
>>>
>>> What is the scope of identifier account-x.y.z :
>>> - universal in space and time?
>>> - the server which returned the provenance?
>>> - the provenance container in which the account was declared?
>>> - something else
>>>
>>> Luc
>>>
>>> On 05/31/2011 04:23 PM, Paul Groth wrote:
>>>
>>>> Hi Simon,
>>>>
>>>> I agree with you. I think the key thing is to realize that provenance
>>>> is asserted by one or more entities (sources?) and thus is an account
>>>> of the state of the world.
>>>>
>>>> I don't think we should be forced to identify these identies. However,
>>>> each account should have an identifier.
>>>>
>>>> Paul
>>>>
>>>> Simon Miles wrote:
>>>>
>>>>> Hi Paul,
>>>>>
>>>>> Thanks for the comments. Answers are interleaved.
>>>>>
>>>>>
>>>>>> I was wondering why an account must be from one source.
>>>>>>
>>>>> Just because it seemed most intuitive, but maybe an account could have
>>>>> multiple sources, as long as we are clear what that would mean.
>>>>>
>>>>> If we meant multiple actors may agree with an account and wouldn't
>>>>> describe what occurred any differently from the same perspective, then
>>>>> that's true but there would still be one actor which originally
>>>>> provided the account.
>>>>>
>>>>> If we meant that multiple actors may be "co-authors" of an account,
>>>>> that would be more reasonable. I guess I was considering such a group
>>>>> as a single source, but I agree this may not be the clearest way to
>>>>> define things. Of course, an account can have a its own provenance
>>>>> where it can be specified in detail who contributed what and how.
>>>>>
>>>>>
>>>>>> I think a source maybe an annotation on an account.
>>>>>>
>>>>> That's an issue separate from concept definition, surely. "Annotation"
>>>>> applies to some data (a serialised account), and "annotating with a
>>>>> source" requires having an identifier for the source, which my
>>>>> definition of account does not require.
>>>>>
>>>>>
>>>>>> I think a more general definition would be.
>>>>>> - An account is a record of something that has occurred from a
>>>>>> particular perspective.
>>>>>>
>>>>> I'm fine with that definition. It still feels like the definition
>>>>> implies rather than makes explicit something significant, i.e. that
>>>>> the account comes from one or a group of sources, but I don't have a
>>>>> strong argument why it needs to be explicit.
>>>>>
>>>>>
>>>>>> I agree with the notion that every description of some occurrence must
>>>>>> be part of an account but I don't think that needs to be identified.
>>>>>>
>>>>> Again, I think this goes beyond the concept definition to design
>>>>> decisions, but maybe we can't separate the two. It depends what you
>>>>> mean by "identified" as to whether I agree with you :-).
>>>>>
>>>>> If you mean that there doesn't need to be any metadata about the
>>>>> account(s) each occurrence is referred to in, such as the source of
>>>>> the account, then I agree it may be too much to require.
>>>>>
>>>>> But if you mean that we may not be able to distinguish whether two
>>>>> assertions about what has occurred are from the same source and
>>>>> perspective or not (i.e. same accounts or not), then I'm not convinced
>>>>> - it seems to go against the purpose of providing provenance to aid
>>>>> trust and interpretation to lose such distinctions.
>>>>>
>>>>> Further, if you provide no identifier for an account, then don't you
>>>>> lose (or make much harder) the possibility of providing metadata about
>>>>> it in the future? So, I would argue that all occurrences, assertions,
>>>>> or whatever parts comprise provenance information, should be part of
>>>>> at least one account, and that those accounts should be given
>>>>> identifiers, even if no other information about the account is
>>>>> provided.
>>>>>
>>>>> Thanks,
>>>>> Simon
>>>>>
>>>>>
>>>>>> thoughts?
>>>>>> Paul
>>>>>>
>>>>>> Simon Miles wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> My proposed starting definition:
>>>>>>>     - An account is a record of something that has occurred provided by
>>>>>>> one source and taking one perspective in describing what occurred.
>>>>>>>
>>>>>>> Notes:
>>>>>>>     - I would expect the provenance of a resource (or whatever
>>>>>>> provenance
>>>>>>> is of) to comprise a set of accounts or parts of accounts, as all the
>>>>>>> information within that provenance has to come from somewhere and take
>>>>>>> some perspective.
>>>>>>>     - The definition does not require that the source be identified -
>>>>>>> whether we require it to be seems a design decision not part of
>>>>>>> concept definition.
>>>>>>>     - The same occurrence (e.g. a "resource" or "process execution")
>>>>>>> could be referred to in multiple accounts. I would expect it to be
>>>>>>> decision of the account sources whether they are referring to the same
>>>>>>> thing in their assertions.
>>>>>>>     - "Perspective" could be rephrased as something more concrete. An
>>>>>>> example of perspective (from OPM) is the granularity of description:
>>>>>>> whether what has occurred is described coarsely or in detail. However,
>>>>>>> there may be other useful distinctions in perspective.
>>>>>>>     - Every occurrence included in some provenance data would be
>>>>>>> part of
>>>>>>> at least one account (if it had not been documented, it could not be
>>>>>>> included). This may be a distinction from OPM, where I believe
>>>>>>> entities can be included in provenance without being in an account.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Simon
>>>>>>>
>>>>>>> On 20 May 2011 08:38, Provenance Working Group Issue Tracker
>>>>>>> <sysbot+tracker@w3.org>      wrote:
>>>>>>>
>>>>>>>> PROV-ISSUE-15 (define-views-or-account): Definition for Concept
>>>>>>>> 'Views or accounts'   [Provenance Terminology]
>>>>>>>>
>>>>>>>> http://www.w3.org/2011/prov/track/issues/15
>>>>>>>>
>>>>>>>> Raised by: Luc Moreau
>>>>>>>> On product: Provenance Terminology
>>>>>>>>
>>>>>>>> The Provenance WG charter identifies the concept 'Views or
>>>>>>>> accounts' as a core concept of the provenance interchange language
>>>>>>>> to be standardized (see http://www.w3.org/2011/01/prov-wg-charter).
>>>>>>>>
>>>>>>>> What term do we adopt for the concept 'Views or accounts'?
>>>>>>>> How do we define the concept 'Views or accounts'?
>>>>>>>> Where does concept 'Views or accounts' appear in ProvenanceExample?
>>>>>>>> Which provenance query requires the concept 'Views or accounts'?
>>>>>>>>
>>>>>>>> Wiki page:http://www.w3.org/2011/prov/wiki/ConceptViewsOrAccounts
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ______________________________________________________________________
>>>>>>>>
>>>>>>>> This email has been scanned by the MessageLabs Email Security System.
>>>>>>>> For more information please visit http://www.messagelabs.com/email
>>>>>>>> ______________________________________________________________________
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> ______________________________________________________________________
>>>>>> This email has been scanned by the MessageLabs Email Security System.
>>>>>> For more information please visit http://www.messagelabs.com/email
>>>>>> ______________________________________________________________________
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>> --
>>> Professor Luc Moreau
>>> Electronics and Computer Science   tel:   +44 23 8059 4487
>>> University of Southampton          fax:   +44 23 8059 2865
>>> Southampton SO17 1BJ               email: l.moreau@ecs.soton.ac.uk
>>> United Kingdom                     http://www.ecs.soton.ac.uk/~lavm
>>>
>>>
>>>
>>> ______________________________________________________________________
>>> This email has been scanned by the MessageLabs Email Security System.
>>> For more information please visit http://www.messagelabs.com/email
>>> ______________________________________________________________________
>>>
>>>
>>
>>
>>
>
>
> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email
> ______________________________________________________________________
>



-- 
Dr Simon Miles
Lecturer, Department of Informatics
Kings College London, WC2R 2LS, UK
+44 (0)20 7848 1166
Received on Thursday, 2 June 2011 17:25:59 UTC