Re: Provenance specs: have we lost sight of the goal?

Everything should be made as simple as possible, but no simpler.  As
we in the community know, provenance can be quite complex, and I think
it is important to be able to represent that complexity when needed.
It is also important that the simple representation be reconcilable
and consistent with those the complex one.  Given that challenge, I
think the group has produced a good spec.

Simple things should be simple, and complicated things should be
possible.  I think that is true of this spec.  (I think the rework --
organizing DM somewhat differently, pulling out PROV-N and
constraints, etc. have contributed to making it simpler and more
easily accessible.)

For the National Climate Assessment, we are trying to capture a quite
complex provenance graph and represent it in a very simple manner.
We're a hub of sorts coordinating between different data centers, each
owning portions of our information base.  PROV gives us some very
simple tools to represent and convey the basic ("scruffy") provenance
that most people are looking for (attribution, derivation) in the
short term.  We can later extend that to more complete, formal
provenance and in the even longer term, PROV will support domain
specific extensions to get even more specific for our needs.

For users who think adding a "By John Smith" byline to an article is
all the provenance they need, PROV may seem like overkill, but when a
"provenance harvester" or other aggregator or browser tool or whatever
ignores them, they'll catch on to the need for a standard.  The key to
PROV is the same as the driver for Linked Data (and the WWW as a whole
really)-- the interoperability which makes it possible to hook my
stuff up with yours.  That driver is going to eventually sell PROV.

I think as real world examples of very simple PROV emerge people will
see how simple it can be to make some very simple assertions.  Users
with those simple needs won't read PROV-DM or even the Primer --
they'll just copy a few examples from other sites.  We could try to
make our descriptions simpler, but frankly that's the way it has
always been and will probably always be.  The vast majority of users
don't actually read the spec.  They'll read blog posts, articles,
tutorials, etc. and start with the basics.  Even more users won't be
aware of PROV at all -- the tools they use will seamlessly inject it
into their documents as they are produced and other tools will have
buttons like "Oh yeah?"  to display provenance on demand.

Curt

On 10/09/2012 06:04 PM, Graham Klyne wrote:
> Over the past few weeks, I have had informal discussions with a
> small number of people about the provenance specifications.  A
> common theme that has emerged is that the provenance specs are
> over-complicated, and that as a result many people (being
> non-provenance specialists) just will not use it.  I've suggested to
> these people that they submit last-call comments to the working
> group, but the general response has been along the lines of "Why
> should I bother?  It doesn't matter to me, I won't use it".
>
> This raises for me the possibility that we are working in an "echo
> chamber", hearing only the views of people who have a particular and
> deep interest in provenance, but not hearing the views of a wider
> audience who he hope will include and consume limited amounts of
> provenance information in their applications.
>
> Maybe it's only me, and the rest of you aren't hearing this kind of
> comment.  But if you are I think that, as we go through the last
> call process, it is appropriate to reflect and consider if what we
> are producing is really relevant to the wider community we aim to
> serve.  Have we become too bound up with fine distinctions that
> don't matter, or don't apply in the same way, to the majority of
> potential provenance-generating and provenance-using applications?
> Have we sacrificed approachability and simplicity that encourages
> widespread take-up on the altar of premature optimization to support
> particular usage scenarios?
>
> While I think these are relevant questions, I'm not sure if and what
> we might do about them.  But I also fear that what we produce may
> turn out to be irrelevant in the long run.

-- 
Curt Tilmes, Ph.D.
U.S. Global Change Research Program
1717 Pennsylvania Avenue NW, Suite 250
Washington, D.C. 20006, USA

+1 202-419-3479 (office)
+1 443-987-6228 (cell)
globalchange.gov

Received on Wednesday, 10 October 2012 17:24:30 UTC