- From: Graham Klyne <graham.klyne@zoo.ox.ac.uk>
- Date: Sat, 09 Jun 2012 13:27:07 +0100
- To: W3C provenance WG <public-prov-wg@w3.org>
TL;DR: absent timely consensus, do not include collections and/or contextualization of provenance in the current round of specifications. ... First, some observations: o1. I count that there have been well over 200 messages to the mailing list since last weekend, dominated by two topics: contextualization of provenance statements in bundles (hasProvenanceIn, etc.) and collections. o2. I cannot see any indication of a clear emerging consensus on either of these topics. o3. We are running short of time to complete our work. o4. As a group, we are overloaded. I have heard several people comment that they don't have time to read all the emails. TYhis suggests to me that we are trying to achieve too much. o5. The features that are causing all the discussion (contextualization and collections) are of much wider applicability than just provenance. o6. For progress on the W3C REC track, each feature should have at least two independent interoperable implementations (I'm not sure what the exact W3C requirements are these days, but that's the basic idea inherited from IETF process). Based on these observations, and my understanding of standards development, I claim that: c1. If the specs we create are used at all, they WILL NOT be the last word. Releasing the specs should be the start of a wider engagement, not the end of a process. c2. The existence of valid and reasonable requirements doesn't mean we have to solve them in this round. It doesn't matter if there are requirements not satisfied by the first round of released specifications. It is far, far better to get clear documentation of an existing consensus than to rush out possibly contentious solutions for every perceived requirement c3. It's easier to add new stuff later than to fix stuff we get wrong. c4. Under-specification is OK, and sometimes a positively good thing - it allows more flexibility for future development as usage patterns emerge. c5. It is far better to have specifications that are useful but incomplete than specifications that are not widely used. c6. The issues subject to continuing discussion are wider than just provenance. Making provenance-specific solutions may actually harm interoperability by blessing an approach incompatible with other proposals from other groups. (cf. "Do your bit, others will do theirs" - Tim Berners-Lee, http://www.w3.org/2006/Talks/0314-ox-tbl/#(22). We should be seeking to adopt common solutions, working with other groups as appropriate, not invent mechanisms that are specific to provenance. ... I am deeply concerned with proposals to bolt contextualization onto bundles. I am finding the proposals hard to understand, and I fear that they could end up violating RDF formal semantics if data model URIs are used in RDF data without translation. I stand by my position stated previously (http://lists.w3.org/Archives/Public/public-prov-wg/2012Jun/0136.html). I suggest that the proposal for contextualization be separated from the main concern of representing provenance information, and written up as a working group NOTE, where it can be evaluated separately. I also note that this issue overlaps with the RDF core working group work on "named graphs", and we should be looking to satisfy the underlying requirements using the RDF mechanisms. This may require some subsequent extension of the data model to match the RDF capabilities, but I believe that to define a mechanism now that cannot be handled using available RDF mechanisms would be damaging for use of provenance on the semantic web. I have stayed out of the discussion of collections, but I note this is an area that has received a lot of attention by other designers. And RDF does have a mechanism (or several) for representing collections. The rdf:List feature was introduced specifically to meet the OWL requirement for a way to represent closed collections. It may seem a bit messy to use, but I think it provides a solution we can use for now, preferable to inventing yet another collection structure. I've seen a similar situation in past standards work, where a group is trying to shoehorn too much into the first round of a specification, and failing to achieve consensus. In the end, the WG chair took a hard like on the principle we agreed in the second face-to-face (if there isn't consensus for a feature, it is removed from the specification). The result was rapid production of a specification that everyone agreed with, even though it didn't contain a lot of things that some people wanted. Following publication of that initial specification, subsequent work of the group accelerated dramatically, and consensus was relatively quickly found for many of the features that previously had been obstructions to progress. ... So, in summary, I propose that: Absent rapidly achieved clear consensus on details of the features under discussion (contextualization and collections), they be dropped from the first round of provenance specifications. Collections can be handled pro-tem using RDF lists. Then we can see what actual requirements emerge in practice. Contextualization should be brought up with the RDF working group in the context of RDF graphs and datasets (aka "named graphs", etc.) - it seems we now have a specific requirement on the table than can inform those discussions. The features that are currently under discussion can also be written up as WG NOTEs. If they serve useful purpose and are well-conceived, they will be used despite not being in the formal specs, and can be included in future rounds. (e.g. Tim and I intend to continue talking about provenance pingback, which could result in a NOTE. If the resulting ideas are sound and useful, I'd expect that they could be incorporated into a future revision of PROV-AQ.) #g
Received on Saturday, 9 June 2012 12:27:52 UTC