Re: modeling macted's example from Luc Moreau on 2012-01-28 (public-prov-wg@w3.org from January 2012)

From: Luc Moreau <L.Moreau@ecs.soton.ac.uk>
Date: Sat, 28 Jan 2012 16:29:14 +0000
To: Simon Miles <simon.miles@kcl.ac.uk>
CC: Provenance Working Group WG <public-prov-wg@w3.org>
Message-ID: <EMEW3|efc204d0e05f1bdd0cda97cc7d4b5b7fo0RGYd08L.Moreau|ecs.soton.ac.uk|1517CEAE>
Hi Simon

Responses below.

Professor Luc Moreau
Electronics and Computer Science
University of Southampton
Southampton SO17 1BJ
United Kingdom

On 28 Jan 2012, at 11:49, "Simon Miles" <simon.miles@kcl.ac.uk<mailto:simon.miles@kcl.ac.uk>> wrote:

Hi Luc,

I don't think you have characterised the situation of this record on the world in your suggestion. I may want to distinguish a copy of this record I obtained from a provenance server from the one I found elsewhere.

I don't see a contradiction between what I was suggesting and the
above. Adapting the example slightly:

wasGeneratedBy (acc1, recording)
wasStartedBy (recording, asserter1)
wasGeneratedBy (acc1receivedToday, emailing)
wasGeneratedBy (acc1fromServer, downloading)
specializationOf (acc1receivedToday, acc1)
specializationOf (acc1fromServer, acc1)


Assuming we go for this encoding, what is the rationale for choosing name acc1 in the first generation record? Could we have used it in the second, and instead get:

specializationOf (acc1, acc1Recorded)
specializationOf (acc1fromServer, acc1Recorded)

Would it be valid? If yes, how do we find the records contained in acc1fromServer?

Furthermore, I may want to characterise it in different ways: e.g. A record using core constructs only, or following some specific pattern, etc.  How do you go about this?

Sorry, I don't really understand what you're intending. In your
example, if you mean a subset of the records in the account, wouldn't
that be a different entity rather than just a different
characterisation? It seems equivalent to comparing a report with a
section of that report. But maybe I'm misunderstanding your intent.

I struggled formulating what I have in mind. Is the content of an account *the only way* to characterise it?  Can I just, for instance, characterise it by the fact that the prime methodology was used  for it? Can't I just characterise it by its size?  Or it's date of initial creation? This is particular important for accounts still being generated. Their content may still vary but other characteristics may be fixed: e.g. Their name acc1.


Luc



Thanks,
Simon


We should maybe try to introduce other characterisations in the example.

Professor Luc Moreau
Electronics and Computer Science
University of Southampton
Southampton SO17 1BJ
United Kingdom

On 27 Jan 2012, at 18:03, "Simon Miles" <simon.miles@kcl.ac.uk<mailto:simon.miles@kcl.ac.uk>> wrote:

Hello Paul,

It makes sense to me, and I could accept it. It matches what I
understand to be the three-level view discussed.

On the other hand, it may not matter that the account can change as
long as the provenance assertions about it stay true. So I could also
be happy with a two-level view in which the following is valid:
 wasGeneratedBy(acc1, emailing)

The PROV-DM spec says "an entity [is] an identifiable characterized
thing. An entity fixes some aspects of a thing and its situation in
the world, so that it becomes possible to express its provenance, and
what causes these specific aspects to be as such."

We can say that acc1 is identifiable, it is characterised, it is a
thing, and it is possible to express its provenance, so it is an
entity. It is only distinct in that it is not a specialisation of some
other entity, and is characterised merely by being that account. It is
in it's nature as an entity that we can express it's provenance using
PROV-DM.

Both two and three-level views seem OK to me, but the two-level view
might be less confusing to explain. Following MacTed's terms in the
telecon, we could say: data is something you can express the
provenance of, provenance is metadata, but metadata is also itself
data.

Thanks,
Simon

On 27 January 2012 17:28, Paul Groth <p.t.groth@vu.nl<mailto:p.t.groth@vu.nl>> wrote:
Hi all,

I thought I would take a go at modeling part of MacTed's provenance of
provenance example.

Here's the description "i have a table, built in 1727 by joe smith ..."
I would model this in prov dm as:

entity(table)
wasGeneratedBy(table, built, 1727)
activity(built)
wasAssociatedWith(built,joe smith)

Now to talk about the provenance of that provenance (generated by an
email activity), I think I would do the following:

acccount(acc1,
    entity(table)
    wasGeneratedBy(table, built, 1727)
    activity(built)
    wasAssociatedWith(built,joe smith)
)

entity(acc_entity_id, [perspectiveOn=acc1])
wasGeneratedBy(acc_entity_id, emailing)

To me we can't just say

wasGeneratedBy(acc1, emailing) because the account may change and also
different people may take different perspectives on the account. So we
need to do a "freezing" operation thus making it into an entity. Then we
can talk about it's provenance.

Thoughts?
Paul




--
Dr Simon Miles
Lecturer, Department of Informatics
Kings College London, WC2R 2LS, UK
+44 (0)20 7848 1166

Provenance-based Validation of E-Science Experiments:
http://eprints.dcs.kcl.ac.uk/1268/




--
Dr Simon Miles
Lecturer, Department of Informatics
Kings College London, WC2R 2LS, UK
+44 (0)20 7848 1166

Provenance-based Validation of E-Science Experiments:
http://eprints.dcs.kcl.ac.uk/1268/
Received on Saturday, 28 January 2012 16:30:21 UTC