Scope, consensus and muddy waters

TL;DR: absent timely consensus, do not include collections and/or 
contextualization of provenance in the current round of specifications.

...

First, some observations:

o1. I count that there have been well over 200 messages to the mailing list 
since last weekend, dominated by two topics:  contextualization of provenance 
statements in bundles (hasProvenanceIn, etc.) and collections.

o2. I cannot see any indication of a clear emerging consensus on either of these 
topics.

o3. We are running short of time to complete our work.

o4. As a group, we are overloaded.  I have heard several people comment that 
they don't have time to read all the emails.  TYhis suggests to me that we are 
trying to achieve too much.

o5. The features that are causing all the discussion (contextualization and 
collections) are of much wider applicability than just provenance.

o6. For progress on the W3C REC track, each feature should have at least two 
independent interoperable implementations (I'm not sure what the exact W3C 
requirements are these days, but that's the basic idea inherited from IETF process).


Based on these observations, and my understanding of standards development, I 
claim that:

c1. If the specs we create are used at all, they WILL NOT be the last word. 
Releasing the specs should be the start of a wider engagement, not the end of a 
process.

c2. The existence of valid and reasonable requirements doesn't mean we have to 
solve them in this round.  It doesn't matter if there are requirements not 
satisfied by the first round of released specifications.  It is far, far better 
to get clear documentation of an existing consensus than to rush out possibly 
contentious solutions for every perceived requirement

c3. It's easier to add new stuff later than to fix stuff we get wrong.

c4. Under-specification is OK, and sometimes a positively good thing - it allows 
more flexibility for future development as usage patterns emerge.

c5. It is far better to have specifications that are useful but incomplete than 
specifications that are not widely used.

c6. The issues subject to continuing discussion are wider than just provenance. 
  Making provenance-specific solutions may actually harm interoperability by 
blessing an approach incompatible with other proposals from other groups.  (cf. 
"Do your  bit, others will do theirs" - Tim Berners-Lee, 
http://www.w3.org/2006/Talks/0314-ox-tbl/#(22).  We should be seeking to adopt 
common solutions, working with other groups as appropriate, not invent 
mechanisms that are specific to provenance.

...

I am deeply concerned with proposals to bolt contextualization onto bundles.  I 
am finding the proposals hard to understand, and I fear that they could end up 
violating RDF formal semantics if data model URIs are used in RDF data without 
translation.  I stand by my position stated previously 
(http://lists.w3.org/Archives/Public/public-prov-wg/2012Jun/0136.html).  I 
suggest that the proposal for contextualization be separated from the main 
concern of representing provenance information, and written up as a working 
group NOTE, where it can be evaluated separately.  I also note that this issue 
overlaps with the RDF core working group work on "named graphs", and we should 
be looking to satisfy the underlying requirements using the RDF mechanisms. 
This may require some subsequent extension of the data model to match the RDF 
capabilities, but I believe that to define a mechanism now that cannot be 
handled using available RDF mechanisms would be damaging for use of provenance 
on the semantic web.

I have stayed out of the discussion of collections, but I note this is an area 
that has received a lot of attention by other designers.  And RDF does have a 
mechanism (or several) for representing collections.  The rdf:List feature was 
introduced specifically to meet the OWL requirement for a way to represent 
closed collections.  It may seem a bit messy to use, but I think it provides a 
solution we can use for now, preferable to inventing yet another collection 
structure.

I've seen a similar situation in past standards work, where a group is trying to 
shoehorn too much into the first round of a specification, and failing to 
achieve consensus.  In the end, the WG chair took a hard like on the principle 
we agreed in the second face-to-face (if there isn't consensus for a feature, it 
is removed from the specification).  The result was rapid production of a 
specification that everyone agreed with, even though it didn't contain a lot of 
things that some people wanted.  Following publication of that initial 
specification, subsequent work of the group accelerated dramatically, and 
consensus was relatively quickly found for many of the features that previously 
had been obstructions to progress.

...

So, in summary, I propose that:

Absent rapidly achieved clear consensus on details of the features under 
discussion (contextualization and collections), they be dropped from the first 
round of provenance specifications.

Collections can be handled pro-tem using RDF lists.  Then we can see what actual 
requirements emerge in practice.

Contextualization should be brought up with the RDF working group in the context 
of RDF graphs and datasets (aka "named graphs", etc.) - it seems we now have a 
specific requirement on the table than can inform those discussions.

The features that are currently under discussion can also be written up as WG 
NOTEs.  If they serve useful purpose and are well-conceived, they will be used 
despite not being in the formal specs, and can be included in future rounds. 
(e.g. Tim and I intend to continue talking about provenance pingback, which 
could result in a NOTE.  If the resulting ideas are sound and useful, I'd expect 
that they could be incorporated into a future revision of PROV-AQ.)

#g

Received on Saturday, 9 June 2012 12:27:52 UTC