- From: Graham Klyne <graham.klyne@zoo.ox.ac.uk>
- Date: Thu, 26 Apr 2012 18:20:52 +0100
- To: Paolo Missier <Paolo.Missier@ncl.ac.uk>
- CC: Satya Sahoo <satya.sahoo@case.edu>, Luc Moreau <L.Moreau@ecs.soton.ac.uk>, Provenance Working Group WG <public-prov-wg@w3.org>
On 26/04/2012 13:39, Paolo Missier wrote: > Graham > > you have made your point on this over and over again. Yes, I've said it before, but I think not (in this context) so much to count as "over and over again". (Previously, I've objected to using collections to model provenance accounts, which was a different matter.) > ... I think we get it, but I > still don't see a strong argument. That is because the criteria used to define > the scope here have been blurry and that has not improved with time. > The comments that followed my own personal opinion on this (attached) seem to > indicate that capturing the evolution of sets may be a good idea, given their > pervasiveness. If this belongs to a specific domain, which domain is it? Fair enough. I'll see if I can substantiate my position... First, to be clear, I'm not saying that "capturing the evolution of sets" is not a good idea. What I question is the extent to which is *should* be *entirely* down to the PROV spec to achieve this. We're defining a standard, and I think it's in the nature of standards for use on the global Internet/Web that the criteria for defining scope are blurry, because we can't expect to anticipate all of the ways in which they will be used. For me, the acid test will be the extent of adoption. In my experience, it is the *simple* standards (of all kinds) that get more widely adopted. TCP/IP vs OSI. SMTP vs X.400. HTTP vs any number of content management systems. I see the same for ontologies/vocabularies. The widely used success stories are ones like DC, FOAF, SIOC, SKOS, etc., which all have the characteristic of focusing on a small set of core concepts. Of course there are more specialized large ontologies/vocabularies that have strong following (e.g. a number of bioinformatics standards), but within much more confined communities. (TimBL has a slide about costs of ontology vs size of community http://www.w3.org/2006/Talks/0314-ox-tbl/#(22) - it emphasizes the benefits of widespread adoption, but doesn't address costs associated with the *size* of the ontology.) In my view, provenance is something that /should/ be there with the likes of DC and FOAF in terms of adoption. Which for me prioritizes keeping it as small as possible to maximize adoption. To repeat: I'm not saying that provenance of collections is not useful. I'm sure it is very useful in many situations. For me the test is not so much what is useful as what *needs* to be in the base provenance spec by virtue of it cannot reasonably be retro-fitted via available extension points. What I have not seen is an explanation that the provenance of collections cannot be handled through specialization of the core provenance concepts we already have. This might even be a separate *standard*. For me, all this is an an application of the principles of minimum power, independent invention and modularity (http://www.w3.org/DesignIssues/Principles.html). In many ways (and, to be clear, this is not a proposal, just an illustration) I'd rather like to see something like OPMV go forward as a base spec for provenance, because it's really clear from that what are the key ideas, and has they tie together. Many of the things the group spends time discussing (including, but limited to, collections) can be layered on this basic model. The tension here is that by specifying more in the base model, one achieves a greater level of interoperability between systems *that fully implement the defined model*, and at the same time decrease the number of systems that attempt to implement the model. This raises the question: is it more beneficial to have a relative few systems implement a very rich model of provenance interoperability, or to have very many systems implement a relatively weak model? And of course, it's not black-or-white ... there are reasonable points between. I think my view is clearly to "turn the dial" to the simpler end of the spectrum but, of course, YMMV. > But I am sorry that you are having to hold your nose. Believe me, the provenance > of a set doesn't smell that bad. That was a figure of speech, and was probably an overly strong statement. As I say above, I'm sure provenance of collections of various kinds is useful and important - what I'm really trying to push on is how much needs to be in the base provenance specs that developers will have to master. I think I later in the discussion I saw a mention of abstract collections that could be specialized in different ways. That, for me, could represent a reasonable compromise, though my preference would be to deal with collections separately. Maybe what I'm doing here is making a case for modularization of the provenance spec (ala PML?), rather lumping it all into one, er, collection. ... Returning to your comment about blurry criteria, here are some that are not blurry (though they are also unsubstantiated, but there are some clues at http://richard.cyganiak.de/blog/2011/02/top-100-most-popular-rdf-namespace-prefixes/): * I think that if we can produce of base provenance ontology of <=8 classes <=12 properties, we stand a chance of deployment at the scale of FOAF (the numbers are approximately the size of FOAF core - http://xmlns.com/foaf/spec/) * I think a base ontology with twice the number of classes could achieve less than 10% of the adoption of FOAF (e.g. compare interest in vCard vs FOAF or DC at http://richard.cyganiak.de/blog/2011/02/top-100-most-popular-rdf-namespace-prefixes/ * I think a base ontology with substantially more terms will receive substantially less adoption. The numbers here are, to be sure, very unscientific. But it's interesting that, not counting the "infrastructure" ontologies (rdf, rdfs, owl, ex), all the "high interest" ontologies that I probes were also relatively small (up to 40 terms overall at a rough guess) On this basis, my criterion becomes very un-blurry: fewer terms is better by far. Of course, there's a balance to be struck, but it brings home to me that each term that is added to the overall provenance ontology has to bring substantial benefit if the adoption (impact) of our work is not to be reduced. ... Finally, the reason I think that PROV *could* be as popular as FOAF is because it is positioned to underpin a key missing feature of the web - providing a machine actionable basis for dealing with conflicting information (trust, information quality assessment). It could be, in a real sense, the FOAF of data ("who are you?", "who do you know?", "where do you come from?", etc.). As yet, we don't *know* what aspects of provenance will be important in this respect, though there is some research (including your own, Paolo) that suggests some directions. So, in pursuit of this goal, the thing about PROV that matters almost more than anything else is scale of adoption. So, on this view, *anything* that stands in the way of adoption without providing needed functionality that cannot be achived in any other way is arguably an impediment to the eventual success of PROV. #g -- > On 4/26/12 12:04 PM, Graham Klyne wrote: >> I find myself somewhat concerned by what appears to be scope creep associated >> with collections. It seems to me that in the area, the provenance model is >> straying in the the domain of application design. If collections were just >> sets, I could probably hold my nose and say nothing, but this talk of having >> provenance define various forms of collection indexing seems to me to be out of >> scope. >> >> So I think this is somewhat in agreement with what Satya says here, though I >> remain unconvinced that the notions of collections and derivation-by-insertion, >> etc., actually *need* to be in the main provenance ontology - why not let >> individual applications define their own provenance extension terms? >> >> #g >> -- >> >> On 18/04/2012 17:35, Satya Sahoo wrote: >>> Hi all, >>> The issue I had raised last week is that collection is an important >>> provenance construct, but the assumption of only key-value pair based >>> collection is too narrow and the relations derivedByInsertionFrom, >>> Derivation-by-Removal are over specifications that are not required. >>> >>> I have collected the following examples for collection, which only require >>> the definition of the collection in DM5 (collection of entities) and they >>> don't have (a) a key-value structure, and (b) derivedByInsertionFrom, >>> derivedByRemovalFrom relations are not needed: >>> 1. Cell line is a collection of cells used in many biomedical experiments. >>> The provenance of the cell line (as a collection) include, who submitted >>> the cell line, what method was used to authenticate the cell line, when was >>> the given cell line contaminated? The provenance of the cells in a cell >>> line include, what is the source of the cells (e.g. organism)? >>> >>> 2. A patient cohort is a collection of patients satisfying some constraints >>> for a research study. The provenance of the cohort include, what >>> eligibility criteria were used to identify the cohort, when was the cohort >>> identified? The provenance of the patients in a cohort may include their >>> health provider etc. >>> >>> Hope this helps our discussion. >>> >>> Thanks. >>> >>> Best, >>> Satya >>> >>> >>> On Thu, Apr 12, 2012 at 5:06 PM, Luc Moreau<L.Moreau@ecs.soton.ac.uk>wrote: >>> >>>> Hi Jun and Satya, >>>> >>>> Following today's call, ACTION-76 [1] and ACTION-77 [2] were raised >>>> against you, as we agreed. >>>> >>>> Cheers, >>>> Luc >>>> >>>> [1] >>>> https://www.w3.org/2011/prov/**track/actions/76<https://www.w3.org/2011/prov/track/actions/76> >>>> >>>> [2] >>>> https://www.w3.org/2011/prov/**track/actions/77<https://www.w3.org/2011/prov/track/actions/77> >>>> >>>> >>>> > >
Received on Thursday, 26 April 2012 17:23:06 UTC