- From: Paul Groth <p.t.groth@vu.nl>
- Date: Tue, 1 May 2012 09:55:48 +0200
- To: Graham Klyne <graham.klyne@zoo.ox.ac.uk>
- Cc: W3C provenance WG <public-prov-wg@w3.org>
Hi Graham, I guess my response would be is that the model has simple starting points and that I think with the proper organization we will be fine. There is consensus on the constructs that we currently have and there has been movement towards consolidation. I think the key is explainability of the model. Where should a developer start? I would argue that we are close to that goal. I would disagree for example that the qualified pattern really should be counted as all new concepts. Once one understands the simple pattern, it is applied consistently. But I guess we are hand waving a bit so one would need to be more specific. :-) I think for example the proposal on quoter is one way to reduce the size of the model. Also, the separation of collections into a distinct document would be good as well. Obviously, other suggestions are appreciated. cheers Paul On Tue, May 1, 2012 at 5:30 AM, Graham Klyne <graham.klyne@zoo.ox.ac.uk> wrote: > On 30/04/2012 11:28, Paul Groth wrote: >> Hi All, >> >> In Graham's comments, he put a rough number on the number of concepts >> namely 40. We are under that at 32 concepts this includes collections. > > I was going to stand back a while from this discussion, but I think I must > correct this (which may have been my own error) - the figure of 40 is total > number of terms, including properties, not just concepts. > > Actually, I think the ideal, for *really* large scale adoption, is under 20 > terms - concepts *and* properties. > > (But of course the numbers are a bit arbitrary - my main point is that I think > we are quite a way over the level of complexity that is likely to achieve really > large-scale deployment.) > >> Now in the ontology we have a bit more but this is because of the >> involvement pattern, which I think actually doesn't increase >> complexity as the pattern is systematic. > > But I fear that's what developers will see. > > #g > -- > >> As it stands, I think the model that the group has put together >> strikes the right balance. We have a set of clear very small set of >> starting points and then have some additional things that are pretty >> core to provenance. >> >> I think there's an argument to be made to put collections/dictionary >> in a separate document for readability purposes but I think they >> should be part of the recommendation. There's been a lot of hard work >> there and the agreement seems to be that they are useful. >> >> cheers >> Paul >> >> On Mon, Apr 30, 2012 at 10:32 AM, Luc Moreau<L.Moreau@ecs.soton.ac.uk> wrote: >>> Hi Stian, >>> >>> Answer interleaved. >>> >>> >>> From: Stian Soiland-Reyes<soiland-reyes@cs.manchester.ac.uk> >>> Date: 29 April 2012 20:44:16 GMT+01:00 >>> To: Graham Klyne<graham.klyne@zoo.ox.ac.uk> >>> Cc: Satya Sahoo<satya.sahoo@case.edu>,<public-prov-wg@w3.org>, Luc Moreau >>> <L.Moreau@ecs.soton.ac.uk>, Paolo Missier<Paolo.Missier@ncl.ac.uk> >>> Subject: Going for simplicity (was: actions related to collections) >>> >>> >>> 5, Insightful. >>> >>> I agree on the general principle of simplicity. I had similar feelings when >>> wasQuoteOf and friends moved in, but have now grown to like the few >>> essential "real world" relations rather than having only a (easily verbose >>> and not very rich) entity-activity-agent model. >>> >>> As you point out, a richer standard will also enable richer integration for >>> fewer clients. >>> >>> One way towards having many adapters, some rich, is a simple core model, and >>> additional buy-in modules. The core gets everyone hooked, the modules gives >>> richness by giving a standard extension, "hey, you are thinking about >>> collections in your prov, how about checking out this bit over here". >>> >>> But we need to make the essential modules. OPM suggested adapters to make >>> profiles and extensions, but I don't know of many such extensions in real >>> life. For instance DataOne is still working on agreeing how to do workflow >>> provenance using OPM. >>> >>> Modules would also work as a kind of damage control. Let's say our view of >>> attribution turned out to be very wrong for digital publishing, however, our >>> view of derivation was a perfect fit. Adapters could choose to use PROV >>> derivations and make their own, richer attribution model. With one massive >>> model, we might easily put people off if one of our aspects are >>> wrong/naive/difficult compared to a domain's view. >>> >>> I believe our current components in DM can form such a modularization. >>> However I have not read any recommendation about how these can be used in >>> such a pick-and-choose adaption, I thought they were merely rhetorical >>> groupings to ease understanding. Luc? >>> >>> >>> Yes, I saw components as a conceptual structuring of the data model, and not >>> as a way of optionally selecting which bit of the model we want to use. >>> >>> There has been (so far!) no indication from the WG that we wanted to make >>> some part of the model optional. This can be considered of course. >>> >>> But to be effective, components need to be complementary. At the moment >>> derivations and responsibility are still entangled. >>> I don't think it's desirable. >>> >>> >>> >>> Is your suggestion that we for instance have /ns/prov# (core), >>> /ns/prov-attribution# etc, or simply drop everything that is not "opmv >>> like"? (My question: why not then use opmv?) >>> >>> >>> I don't think we are keen to introduce multiple namespaces. >>> >>> Luc >>> >>> -- >>> Stian Soiland-Reyes, myGrid team >>> School of Computer Science >>> The University of Manchester >>> >>> On Apr 26, 2012 6:24 PM, "Graham Klyne"<graham.klyne@zoo.ox.ac.uk> wrote: >>>> >>>> On 26/04/2012 13:39, Paolo Missier wrote: >>>>> >>>>> Graham >>>>> >>>>> you have made your point on this over and over again. >>>> >>>> >>>> Yes, I've said it before, but I think not (in this context) so much to >>>> count as "over and over again". (Previously, I've objected to using >>>> collections to model provenance accounts, which was a different matter.) >>>> >>>>> ... I think we get it, but I >>>>> still don't see a strong argument. That is because the criteria used to >>>>> define >>>>> the scope here have been blurry and that has not improved with time. >>>>> The comments that followed my own personal opinion on this (attached) >>>>> seem to >>>>> indicate that capturing the evolution of sets may be a good idea, given >>>>> their >>>>> pervasiveness. If this belongs to a specific domain, which domain is it? >>>> >>>> >>>> Fair enough. I'll see if I can substantiate my position... >>>> >>>> First, to be clear, I'm not saying that "capturing the evolution of sets" >>>> is not a good idea. What I question is the extent to which is *should* be >>>> *entirely* down to the PROV spec to achieve this. >>>> >>>> We're defining a standard, and I think it's in the nature of standards for >>>> use on the global Internet/Web that the criteria for defining scope are >>>> blurry, because we can't expect to anticipate all of the ways in which they >>>> will be used. >>>> >>>> For me, the acid test will be the extent of adoption. In my experience, >>>> it is the *simple* standards (of all kinds) that get more widely adopted. >>>> TCP/IP vs OSI. SMTP vs X.400. HTTP vs any number of content management >>>> systems. >>>> >>>> I see the same for ontologies/vocabularies. The widely used success >>>> stories are ones like DC, FOAF, SIOC, SKOS, etc., which all have the >>>> characteristic of focusing on a small set of core concepts. Of course there >>>> are more specialized large ontologies/vocabularies that have strong >>>> following (e.g. a number of bioinformatics standards), but within much more >>>> confined communities. (TimBL has a slide about costs of ontology vs size of >>>> community http://www.w3.org/2006/Talks/0314-ox-tbl/#(22) - it emphasizes the >>>> benefits of widespread adoption, but doesn't address costs associated with >>>> the *size* of the ontology.) >>>> >>>> In my view, provenance is something that /should/ be there with the likes >>>> of DC and FOAF in terms of adoption. Which for me prioritizes keeping it as >>>> small as possible to maximize adoption. >>>> >>>> To repeat: I'm not saying that provenance of collections is not useful. >>>> I'm sure it is very useful in many situations. For me the test is not so >>>> much what is useful as what *needs* to be in the base provenance spec by >>>> virtue of it cannot reasonably be retro-fitted via available extension >>>> points. What I have not seen is an explanation that the provenance of >>>> collections cannot be handled through specialization of the core provenance >>>> concepts we already have. This might even be a separate *standard*. >>>> >>>> For me, all this is an an application of the principles of minimum power, >>>> independent invention and modularity >>>> (http://www.w3.org/DesignIssues/Principles.html). >>>> >>>> In many ways (and, to be clear, this is not a proposal, just an >>>> illustration) I'd rather like to see something like OPMV go forward as a >>>> base spec for provenance, because it's really clear from that what are the >>>> key ideas, and has they tie together. >>>> >>>> Many of the things the group spends time discussing (including, but >>>> limited to, collections) can be layered on this basic model. The tension >>>> here is that by specifying more in the base model, one achieves a greater >>>> level of interoperability between systems *that fully implement the defined >>>> model*, and at the same time decrease the number of systems that attempt to >>>> implement the model. This raises the question: is it more beneficial to >>>> have a relative few systems implement a very rich model of provenance >>>> interoperability, or to have very many systems implement a relatively weak >>>> model? And of course, it's not black-or-white ... there are reasonable >>>> points between. I think my view is clearly to "turn the dial" to the >>>> simpler end of the spectrum but, of course, YMMV. >>>> >>>>> But I am sorry that you are having to hold your nose. Believe me, the >>>>> provenance >>>>> of a set doesn't smell that bad. >>>> >>>> >>>> That was a figure of speech, and was probably an overly strong statement. >>>> >>>> As I say above, I'm sure provenance of collections of various kinds is >>>> useful and important - what I'm really trying to push on is how much needs >>>> to be in the base provenance specs that developers will have to master. >>>> >>>> I think I later in the discussion I saw a mention of abstract collections >>>> that could be specialized in different ways. That, for me, could represent >>>> a reasonable compromise, though my preference would be to deal with >>>> collections separately. >>>> >>>> Maybe what I'm doing here is making a case for modularization of the >>>> provenance spec (ala PML?), rather lumping it all into one, er, collection. >>>> >>>> ... >>>> >>>> Returning to your comment about blurry criteria, here are some that are >>>> not blurry (though they are also unsubstantiated, but there are some clues >>>> at >>>> http://richard.cyganiak.de/blog/2011/02/top-100-most-popular-rdf-namespace-prefixes/): >>>> >>>> * I think that if we can produce of base provenance ontology of<=8 >>>> classes<=12 properties, we stand a chance of deployment at the scale of >>>> FOAF (the numbers are approximately the size of FOAF core - >>>> http://xmlns.com/foaf/spec/) >>>> >>>> * I think a base ontology with twice the number of classes could achieve >>>> less than 10% of the adoption of FOAF (e.g. compare interest in vCard vs >>>> FOAF or DC at >>>> http://richard.cyganiak.de/blog/2011/02/top-100-most-popular-rdf-namespace-prefixes/ >>>> >>>> * I think a base ontology with substantially more terms will receive >>>> substantially less adoption. >>>> >>>> The numbers here are, to be sure, very unscientific. But it's interesting >>>> that, not counting the "infrastructure" ontologies (rdf, rdfs, owl, ex), all >>>> the "high interest" ontologies that I probes were also relatively small (up >>>> to 40 terms overall at a rough guess) >>>> >>>> On this basis, my criterion becomes very un-blurry: fewer terms is better >>>> by far. >>>> >>>> Of course, there's a balance to be struck, but it brings home to me that >>>> each term that is added to the overall provenance ontology has to bring >>>> substantial benefit if the adoption (impact) of our work is not to be >>>> reduced. >>>> >>>> ... >>>> >>>> Finally, the reason I think that PROV *could* be as popular as FOAF is >>>> because it is positioned to underpin a key missing feature of the web - >>>> providing a machine actionable basis for dealing with conflicting >>>> information (trust, information quality assessment). It could be, in a real >>>> sense, the FOAF of data ("who are you?", "who do you know?", "where do you >>>> come from?", etc.). >>>> >>>> As yet, we don't *know* what aspects of provenance will be important in >>>> this respect, though there is some research (including your own, Paolo) that >>>> suggests some directions. So, in pursuit of this goal, the thing about PROV >>>> that matters almost more than anything else is scale of adoption. So, on >>>> this view, *anything* that stands in the way of adoption without providing >>>> needed functionality that cannot be achived in any other way is arguably an >>>> impediment to the eventual success of PROV. >>>> >>>> #g >>>> -- >>>> >>>>> On 4/26/12 12:04 PM, Graham Klyne wrote: >>>>>> >>>>>> I find myself somewhat concerned by what appears to be scope creep >>>>>> associated >>>>>> with collections. It seems to me that in the area, the provenance model >>>>>> is >>>>>> straying in the the domain of application design. If collections were >>>>>> just >>>>>> sets, I could probably hold my nose and say nothing, but this talk of >>>>>> having >>>>>> provenance define various forms of collection indexing seems to me to be >>>>>> out of >>>>>> scope. >>>>>> >>>>>> So I think this is somewhat in agreement with what Satya says here, >>>>>> though I >>>>>> remain unconvinced that the notions of collections and >>>>>> derivation-by-insertion, >>>>>> etc., actually *need* to be in the main provenance ontology - why not >>>>>> let >>>>>> individual applications define their own provenance extension terms? >>>>>> >>>>>> #g >>>>>> -- >>>>>> >>>>>> On 18/04/2012 17:35, Satya Sahoo wrote: >>>>>>> >>>>>>> Hi all, >>>>>>> The issue I had raised last week is that collection is an important >>>>>>> provenance construct, but the assumption of only key-value pair based >>>>>>> collection is too narrow and the relations derivedByInsertionFrom, >>>>>>> Derivation-by-Removal are over specifications that are not required. >>>>>>> >>>>>>> I have collected the following examples for collection, which only >>>>>>> require >>>>>>> the definition of the collection in DM5 (collection of entities) and >>>>>>> they >>>>>>> don't have (a) a key-value structure, and (b) derivedByInsertionFrom, >>>>>>> derivedByRemovalFrom relations are not needed: >>>>>>> 1. Cell line is a collection of cells used in many biomedical >>>>>>> experiments. >>>>>>> The provenance of the cell line (as a collection) include, who >>>>>>> submitted >>>>>>> the cell line, what method was used to authenticate the cell line, when >>>>>>> was >>>>>>> the given cell line contaminated? The provenance of the cells in a cell >>>>>>> line include, what is the source of the cells (e.g. organism)? >>>>>>> >>>>>>> 2. A patient cohort is a collection of patients satisfying some >>>>>>> constraints >>>>>>> for a research study. The provenance of the cohort include, what >>>>>>> eligibility criteria were used to identify the cohort, when was the >>>>>>> cohort >>>>>>> identified? The provenance of the patients in a cohort may include >>>>>>> their >>>>>>> health provider etc. >>>>>>> >>>>>>> Hope this helps our discussion. >>>>>>> >>>>>>> Thanks. >>>>>>> >>>>>>> Best, >>>>>>> Satya >>>>>>> >>>>>>> >>>>>>> On Thu, Apr 12, 2012 at 5:06 PM, Luc >>>>>>> Moreau<L.Moreau@ecs.soton.ac.uk>wrote: >>>>>>> >>>>>>>> Hi Jun and Satya, >>>>>>>> >>>>>>>> Following today's call, ACTION-76 [1] and ACTION-77 [2] were raised >>>>>>>> against you, as we agreed. >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Luc >>>>>>>> >>>>>>>> [1] >>>>>>>> >>>>>>>> https://www.w3.org/2011/prov/**track/actions/76<https://www.w3.org/2011/prov/track/actions/76> >>>>>>>> >>>>>>>> [2] >>>>>>>> >>>>>>>> https://www.w3.org/2011/prov/**track/actions/77<https://www.w3.org/2011/prov/track/actions/77> >>>>>>>> >>>>>>>> >>>>>>>> >>>>> >>>>> >>>> >>> >>> -- >>> Professor Luc Moreau >>> Electronics and Computer Science tel: +44 23 8059 4487 >>> University of Southampton fax: +44 23 8059 2865 >>> Southampton SO17 1BJ email: l.moreau@ecs.soton.ac.uk >>> United Kingdom http://www.ecs.soton.ac.uk/~lavm >> >> >> > -- -- Dr. Paul Groth (p.t.groth@vu.nl) http://www.few.vu.nl/~pgroth/ Assistant Professor Knowledge Representation & Reasoning Group Artificial Intelligence Section Department of Computer Science VU University Amsterdam
Received on Tuesday, 1 May 2012 07:56:18 UTC