F2F2 Meeting Summary from Paul Groth on 2012-02-05 (public-prov-wg@w3.org from February 2012)

From: Paul Groth <p.t.groth@vu.nl>
Date: Sun, 05 Feb 2012 14:49:49 +0100
To: "public-prov-wg@w3.org" <public-prov-wg@w3.org>
Message-ID: <4F2E88FD.8050606@vu.nl>
Hi All,

I wanted to provide a set of highlights of the F2F meeting. First, let 
me thank all the people who came. It was good to see all of you. Also, 
thanks to those who dialed in to be on the meeting sometimes heroically 
(Tim!). The major take home points of the meeting from my perspective 
were as follows:


*Scheduling*
The group is a couple of months behind. To get us back on track, the 
chairs have developed a  revised time table available here:
http://www.w3.org/2011/prov/wiki/F2F2Intro

To meet the schedule, the group will have to be more disciplined. We 
have to avoid talking about corner cases (i.e. eggs, ice melting, and 
molecule discussions) and in general be sure that we make concrete 
proposals. The chairs will be become better at using the W3C process to 
push things forward. Finally, for all drafts that need to be reviewed by 
the group we will be assigning specific reviewers to do the job.

*Simplification of the Model*
We were encouraged to simplify by Ivan. We must make sure that PROV is 
easy to use by our key communities including the Linked Data community.
I think the whole group agreed that this concept and has been working on 
it. But we need to do better. If we can't get consensus around a 
construct, the construct will be dropped. We can't support everything.

*Presenting the Work*
A key notion was that we need to present our work going from Primer --> 
PROV-O --> PROV-DM --> PROV-SEM. It's critical that communities don't 
have to touch the PROV-DM or PROV-SEM unless they want to.  Thus, from 
now on, we should release our documents in parallel in order to give a 
complete set to potential users. We can talk about how to do this at 
upcoming phone calls.

*Accounts*
To help with simplification, account records in their current form (as 
an explicit construct of the data model) will be dropped in favor of a 
lightweight "bundling" approach needed to meet the provenance of 
provenance requirement.

*Entities and Identifiers*
There has been a long discussion on identifiers (what are we 
identifying?) and what is an entity. Through the lense of the semantics, 
we discussed three layers: Things (so-called Objective layer), Entities 
(so-called Social layer) and Representations.  We agreed that the data 
model should only mention two layers, it should not talk about Things, 
but the semantics can introduce Things to explain some of the constructs 
of the data model. (See 
http://www.w3.org/2011/prov/wiki/FormalSemanticsStrawman)


At the meeting, it became clear that a further problem rises from trying 
to cope with two different forms of use cases:

1) "Scruffy Provenance". The ability to use the PROV vocabulary to make 
statements about existing things on the web. Think for example adding 
simple provenance metadata in a web page.
2) "Proper Provenance". The ability to exchange PROV information between 
provenance systems where a "fixed" view of data is key. This is common 
in current provenance tracking systems. Think exchanging information 
between version control systems or two workflow systems that capture 
provenance.

To cope with this distinction, consensus was that the model needs to 
introduce the minimum set of concepts necessary for "scruffy 
provenance". The upgrade path to proper provenance will, in a second 
phase, specify the necessary more advanced concepts, possibly backed by 
the provenance semantics, where appropriate.

This was the guidance given to the PROV-DM editors and they are actioned 
to come up with a revised simplified introduction to prov-dm  by Feb 16 
for review by the group (ACTION-62).

*Synchronization*
There have been some issues with trying to synchronize between PROV-DM 
and PROV-O. The group agreed to adopt a new process:

First, PROV-O will be synchronized with the current public working draft 
(PROV-DM WD3) except for accounts. To make the work faster, the PROV-O 
team should focus only on the owl ontology for now. The team is actioned 
to produce an ontology for review by Feb 16 (ACTION-55). The team was 
also encouraged not to get bogged down in constructs that are in-flux. 
Those constructs that have issues on them.

After PROV-O and PROV-DM are at the same level any subsequent changes 
between PROV-DM and PROV-O will now be synchronized through a mappings 
page. (http://www.w3.org/2011/prov/wiki/ProvRDF). When someone wants to 
change a construct in either PROV-DM or PROV-O, they will propose a new 
mapping and show the current mapping. The group can then approve this 
change or not. This will ease the the updating process.


Both the mappings and the semantics will also be synchronized to the WD3 
version of PROV-DM. Additionally, PROV-PRIMER will be brought up to sync 
once PROV-O is released.

*PROV-O Design*
A couple of points were made about the design of PROV-O. Currently, 
PROV-O looks like it is an OWL-RL ontology. Ivan encouraged us to keep 
it that way and advertise that fact. This is important for uptake.

We shouldn't be too worried about having extra constructs in PROV-O 
having simple constructs that can be upgraded to more complex versions 
seems to be a useful design principle.

Aspects of the PROV-O document related to its formal semantics did not 
seem to be appropriate for the target audience.

*Dublin Core best practice*
Kai has agreed to lead an effort to create a best practice showing how 
PROV and Dublin Core can work together.

*Implementation Task Force and Interoperability*
The group agreed to the following in terms of interoperability:
- For interoperability we catalogue existing implementations and which 
constructs of prov they use. Looking for at least two implementations of 
each construct. Furthermore, which pair of implementations can exchange 
prov (different pairs may exchange different constructs)
- In addition, we are looking to build a set of simple test cases that 
show what a system/tool can use to show that it correctly understands 
provenance. This should be based on the examples produced by Tim.

*PAQ*
The PAQ will be updated to more clearly make distinctions between 
locating provenance and retrieving it. The notions of what is best 
practice and what is a specification will be made clearer. The editors 
need to make sure that provenance-service definitions can easily allow 
provenance services to be created that are compatible but that can allow 
for requester scaling.

Overall, the meeting was productive and had a great working atmosphere. 
Let's keep up this great team effort.

Regards,
Paul





-- 
Dr. Paul Groth (p.t.groth@vu.nl)
http://www.few.vu.nl/~pgroth/
Assistant Professor
Knowledge Representation & Reasoning Group
Artificial Intelligence Section
Department of Computer Science
VU University Amsterdam
Received on Sunday, 5 February 2012 13:50:29 UTC