Going for simplicity (was: actions related to collections) from Stian Soiland-Reyes on 2012-04-29 (public-prov-wg@w3.org from April 2012)

From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
Date: Sun, 29 Apr 2012 20:44:16 +0100
To: Graham Klyne <graham.klyne@zoo.ox.ac.uk>
Cc: Satya Sahoo <satya.sahoo@case.edu>, public-prov-wg@w3.org, Luc Moreau <L.Moreau@ecs.soton.ac.uk>, Paolo Missier <Paolo.Missier@ncl.ac.uk>
Message-ID: <CAPRnXtmquj9mBZ3cME41uRQD749OKNuGuXmSJniM1md+UQhtHQ@mail.gmail.com>
5, Insightful.

I agree on the general principle of simplicity. I had similar feelings when
wasQuoteOf and friends moved in, but have now grown to like the few
essential "real world" relations rather than having only a (easily verbose
and not very rich) entity-activity-agent model.

As you point out, a richer standard will also enable richer integration for
fewer clients.

One way towards having many adapters, some rich, is a simple core model,
and additional buy-in modules. The core gets everyone hooked, the modules
gives richness by giving a standard extension, "hey, you are thinking about
collections in your prov, how about checking out this bit over here".

But we need to make the essential modules. OPM suggested adapters to make
profiles and extensions, but I don't know of many such extensions in real
life. For instance DataOne is still working on agreeing how to do workflow
provenance using OPM.

Modules would also work as a kind of damage control. Let's say our view of
attribution turned out to be very wrong for digital publishing, however,
our view of derivation was a perfect fit. Adapters could choose to use PROV
derivations and make their own, richer attribution model. With one massive
model, we might easily put people off if one of our aspects are
wrong/naive/difficult compared to a domain's view.

I believe our current components in DM can form such a modularization.
However I have not read any recommendation about how these can be used in
such a pick-and-choose adaption, I thought they were merely rhetorical
groupings to ease understanding. Luc?

Is your suggestion that we for instance have /ns/prov# (core),
/ns/prov-attribution# etc, or simply drop everything that is not "opmv
like"? (My question: why not then use opmv?)

-- 
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester
On Apr 26, 2012 6:24 PM, "Graham Klyne" <graham.klyne@zoo.ox.ac.uk> wrote:

> On 26/04/2012 13:39, Paolo Missier wrote:
>
>> Graham
>>
>> you have made your point on this over and over again.
>>
>
> Yes, I've said it before, but I think not (in this context) so much to
> count as "over and over again".  (Previously, I've objected to using
> collections to model provenance accounts, which was a different matter.)
>
>  ... I think we get it, but I
>> still don't see a strong argument. That is because the criteria used to
>> define
>> the scope here have been blurry and that has not improved with time.
>> The comments that followed my own personal opinion on this (attached)
>> seem to
>> indicate that capturing the evolution of sets may be a good idea, given
>> their
>> pervasiveness. If this belongs to a specific domain, which domain is it?
>>
>
> Fair enough.  I'll see if I can substantiate my position...
>
> First, to be clear, I'm not saying that "capturing the evolution of sets"
> is not a good idea.  What I question is the extent to which is *should* be
> *entirely* down to the PROV spec to achieve this.
>
> We're defining a standard, and I think it's in the nature of standards for
> use on the global Internet/Web that the criteria for defining scope are
> blurry, because we can't expect to anticipate all of the ways in which they
> will be used.
>
> For me, the acid test will be the extent of adoption.  In my experience,
> it is the *simple* standards (of all kinds) that get more widely adopted.
>  TCP/IP vs OSI.  SMTP vs X.400.  HTTP vs any number of content management
> systems.
>
> I see the same for ontologies/vocabularies.  The widely used success
> stories are ones like DC, FOAF, SIOC, SKOS, etc., which all have the
> characteristic of focusing on a small set of core concepts.  Of course
> there are more specialized large ontologies/vocabularies that have strong
> following (e.g. a number of bioinformatics standards), but within much more
> confined communities.  (TimBL has a slide about costs of ontology vs size
> of community http://www.w3.org/2006/Talks/**0314-ox-tbl/#(22)<http://www.w3.org/2006/Talks/0314-ox-tbl/#(22)>- it emphasizes the benefits of widespread adoption, but doesn't address
> costs associated with the *size* of the ontology.)
>
> In my view, provenance is something that /should/ be there with the likes
> of DC and FOAF in terms of adoption.  Which for me prioritizes keeping it
> as small as possible to maximize adoption.
>
> To repeat: I'm not saying that provenance of collections is not useful.
>  I'm sure it is very useful in many situations.  For me the test is not so
> much what is useful as what *needs* to be in the base provenance spec by
> virtue of it cannot reasonably be retro-fitted via available extension
> points.  What I have not seen is an explanation that the provenance of
> collections cannot be handled through specialization of the core provenance
> concepts we already have.  This might even be a separate *standard*.
>
> For me, all this is an an application of the principles of minimum power,
> independent invention and modularity (http://www.w3.org/**
> DesignIssues/Principles.html<http://www.w3.org/DesignIssues/Principles.html>
> ).
>
> In many ways (and, to be clear, this is not a proposal, just an
> illustration) I'd rather like to see something like OPMV go forward as a
> base spec for provenance, because it's really clear from that what are the
> key ideas, and has they tie together.
>
> Many of the things the group spends time discussing (including, but
> limited to, collections) can be layered on this basic model.  The tension
> here is that by specifying more in the base model, one achieves a greater
> level of interoperability between systems *that fully implement the defined
> model*, and at the same time decrease the number of systems that attempt to
> implement the model.  This raises the question: is it more beneficial to
> have a relative few systems implement a very rich model of provenance
> interoperability, or to have very many systems implement a relatively weak
> model?  And of course, it's not black-or-white ... there are reasonable
> points between.   I think my view is clearly to "turn the dial" to the
> simpler end of the spectrum but, of course, YMMV.
>
>  But I am sorry that you are having to hold your nose. Believe me, the
>> provenance
>> of a set doesn't smell that bad.
>>
>
> That was a figure of speech, and was probably an overly strong statement.
>
> As I say above, I'm sure provenance of collections of various kinds is
> useful and important - what I'm really trying to push on is how much needs
> to be in the base provenance specs that developers will have to master.
>
> I think I later in the discussion I saw a mention of abstract collections
> that could be specialized in different ways.  That, for me, could represent
> a reasonable compromise, though my preference would be to deal with
> collections separately.
>
> Maybe what I'm doing here is making a case for modularization of the
> provenance spec (ala PML?), rather lumping it all into one, er, collection.
>
> ...
>
> Returning to your comment about blurry criteria, here are some that are
> not blurry (though they are also unsubstantiated, but there are some clues
> at http://richard.cyganiak.de/**blog/2011/02/top-100-most-**
> popular-rdf-namespace-**prefixes/<http://richard.cyganiak.de/blog/2011/02/top-100-most-popular-rdf-namespace-prefixes/>
> ):
>
> * I think that if we can produce of base provenance ontology of <=8
> classes <=12 properties, we stand a chance of deployment at the scale of
> FOAF (the numbers are approximately the size of FOAF core -
> http://xmlns.com/foaf/spec/)
>
> * I think a base ontology with twice the number of classes could achieve
> less than 10% of the adoption of FOAF (e.g. compare interest in vCard vs
> FOAF or DC at http://richard.cyganiak.de/**blog/2011/02/top-100-most-**
> popular-rdf-namespace-**prefixes/<http://richard.cyganiak.de/blog/2011/02/top-100-most-popular-rdf-namespace-prefixes/>
>
> * I think a base ontology with substantially more terms will receive
> substantially less adoption.
>
> The numbers here are, to be sure, very unscientific.  But it's interesting
> that, not counting the "infrastructure" ontologies (rdf, rdfs, owl, ex),
> all the "high interest" ontologies that I probes were also relatively small
> (up to 40 terms overall at a rough guess)
>
> On this basis, my criterion becomes very un-blurry: fewer terms is better
> by far.
>
> Of course, there's a balance to be struck, but it brings home to me that
> each term that is added to the overall provenance ontology has to bring
> substantial benefit if the adoption (impact) of our work is not to be
> reduced.
>
> ...
>
> Finally, the reason I think that PROV *could* be as popular as FOAF is
> because it is positioned to underpin a key missing feature of the web -
> providing a machine actionable basis for dealing with conflicting
> information (trust, information quality assessment).  It could be, in a
> real sense, the FOAF of data ("who are you?", "who do you know?", "where do
> you come from?", etc.).
>
> As yet, we don't *know* what aspects of provenance will be important in
> this respect, though there is some research (including your own, Paolo)
> that suggests some directions.  So, in pursuit of this goal, the thing
> about PROV that matters almost more than anything else is scale of
> adoption.  So, on this view, *anything* that stands in the way of adoption
> without providing needed functionality that cannot be achived in any other
> way is arguably an impediment to the eventual success of PROV.
>
> #g
> --
>
>  On 4/26/12 12:04 PM, Graham Klyne wrote:
>>
>>> I find myself somewhat concerned by what appears to be scope creep
>>> associated
>>> with collections. It seems to me that in the area, the provenance model
>>> is
>>> straying in the the domain of application design. If collections were
>>> just
>>> sets, I could probably hold my nose and say nothing, but this talk of
>>> having
>>> provenance define various forms of collection indexing seems to me to be
>>> out of
>>> scope.
>>>
>>> So I think this is somewhat in agreement with what Satya says here,
>>> though I
>>> remain unconvinced that the notions of collections and
>>> derivation-by-insertion,
>>> etc., actually *need* to be in the main provenance ontology - why not let
>>> individual applications define their own provenance extension terms?
>>>
>>> #g
>>> --
>>>
>>> On 18/04/2012 17:35, Satya Sahoo wrote:
>>>
>>>> Hi all,
>>>> The issue I had raised last week is that collection is an important
>>>> provenance construct, but the assumption of only key-value pair based
>>>> collection is too narrow and the relations derivedByInsertionFrom,
>>>> Derivation-by-Removal are over specifications that are not required.
>>>>
>>>> I have collected the following examples for collection, which only
>>>> require
>>>> the definition of the collection in DM5 (collection of entities) and
>>>> they
>>>> don't have (a) a key-value structure, and (b) derivedByInsertionFrom,
>>>> derivedByRemovalFrom relations are not needed:
>>>> 1. Cell line is a collection of cells used in many biomedical
>>>> experiments.
>>>> The provenance of the cell line (as a collection) include, who submitted
>>>> the cell line, what method was used to authenticate the cell line, when
>>>> was
>>>> the given cell line contaminated? The provenance of the cells in a cell
>>>> line include, what is the source of the cells (e.g. organism)?
>>>>
>>>> 2. A patient cohort is a collection of patients satisfying some
>>>> constraints
>>>> for a research study. The provenance of the cohort include, what
>>>> eligibility criteria were used to identify the cohort, when was the
>>>> cohort
>>>> identified? The provenance of the patients in a cohort may include their
>>>> health provider etc.
>>>>
>>>> Hope this helps our discussion.
>>>>
>>>> Thanks.
>>>>
>>>> Best,
>>>> Satya
>>>>
>>>>
>>>> On Thu, Apr 12, 2012 at 5:06 PM, Luc Moreau<L.Moreau@ecs.soton.ac.**uk<L.Moreau@ecs.soton.ac.uk>
>>>> >wrote:
>>>>
>>>>  Hi Jun and Satya,
>>>>>
>>>>> Following today's call, ACTION-76 [1] and ACTION-77 [2] were raised
>>>>> against you, as we agreed.
>>>>>
>>>>> Cheers,
>>>>> Luc
>>>>>
>>>>> [1]
>>>>> https://www.w3.org/2011/prov/****track/actions/76<https://www.w3.org/2011/prov/**track/actions/76>
>>>>> <https://www.**w3.org/2011/prov/track/**actions/76<https://www.w3.org/2011/prov/track/actions/76>
>>>>> >
>>>>>
>>>>> [2]
>>>>> https://www.w3.org/2011/prov/****track/actions/77<https://www.w3.org/2011/prov/**track/actions/77>
>>>>> <https://www.**w3.org/2011/prov/track/**actions/77<https://www.w3.org/2011/prov/track/actions/77>
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>>
>>
>>
>
Received on Sunday, 29 April 2012 19:44:47 UTC