Re: Provenance specs: have we lost sight of the goal? from Stian Soiland-Reyes on 2012-10-25 (public-prov-wg@w3.org from October 2012)

From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
Date: Thu, 25 Oct 2012 13:00:58 +0100
To: Graham Klyne <GK@ninebynine.org>
Cc: W3C provenance WG <public-prov-wg@w3.org>
Message-ID: <CAPRnXt=VC7jc2KqU2H-Tru_hn4BjE4djuT7fwNOvsEC=2EvKLQ@mail.gmail.com>
I know I'm replying to this a bit late, I'm sorry.

I have had similar comments come back through third parties, and I've
been encouraging those to actually send a comment to the WG, but got a
"meh" back - the attitude is "I don't understand this, so I am not
going to use it, and so I am not going to tell you that I don't
understand it". These are academics!


I think what brings on the fear of complexity is once you have clicked
your way into the PROV-O or PROV-DM specification, the mere size of
them can be daunting. If you read the primer, everything looks great,
and is dead easy to understand! A colleague said he spent 15 minutes
reading just the primer, and he said he got it right away, and wanted
to use PROV in his product.


What I think we need is some kind of stepping stone from the primer
before you delve into the specs, kind of "Primer next steps" - to show
that all you really need to understand are Entity/Activity/Agent and a
few simple relationships between them - and then that if you need to
detail any of those, then that is possible, but certainly not
required.  How the serialization under OWL/RDF is described in PROV-O,
how you could use the simple or the qualified version (and when), how
the exact semantic meaning of those statements are in PROV-DM, and
what constraints and inferences follow from PROV-Constraints.  This
could be done as a last chapter in a primer, showing an example of say
wasGeneratedBy.



But some people jump straight to the PROV-O document - they know that
the realization they would do would be RDF, and go there (Just like I
would try to read the OWL2 spec by going straight to the RDF bit -
which is very confusing because it only talks about the RDF
*mapping*), but then they get scared by qualified generations and a
long list of terms.

I think we've done a lot on the way. For instance PROV-O starts with
http://www.w3.org/TR/prov-o/#description-starting-point-terms - a
simple and easy explanation to Entity/Activity/Agent. But then,
instead of detailing those
(http://www.w3.org/TR/prov-o/#cross-reference-starting-point-terms) we
move on to Expanded Terms, and then finally Qualified Terms.   Perhaps
if we reversed the list it would be more sensical? Kind of like TOC:

1. Introduction
2. PROV-O at a glance
3  Starting Point Terms
--3.1 Ontology Description
--3.2 Cross reference
4 Expanded Terms
--4.1 Ontology Description
--4.2 Cross reference
5 Qualified Terms
--5.1 Ontology Description
--5.2 Cross reference
A. PROV-O OWL Profile
B. Names of inverse properties
C. Acknowledgements
D. References
D.1 Normative references
D.2 Informative references


What do you think of this idea? (Did we try something like that
earlier, Khalid/Tim/Jun ? )






On Tue, Oct 9, 2012 at 11:04 PM, Graham Klyne <GK@ninebynine.org> wrote:
> (Now that I'm on holiday, away from the day-to-day pressures of getting
> stuff done, I find a little time to put down some nagging doubts I've been
> having about how our work is going...)
>
> Over the past few weeks, I have had informal discussions with a small number
> of people about the provenance specifications.  A common theme that has
> emerged is that the provenance specs are over-complicated, and that as a
> result many people (being non-provenance specialists) just will not use it.
> I've suggested to these people that they submit last-call comments to the
> working group, but the general response has been along the lines of "Why
> should I bother?  It doesn't matter to me, I won't use it".
>
> This raises for me the possibility that we are working in an "echo chamber",
> hearing only the views of people who have a particular and deep interest in
> provenance, but not hearing the views of a wider audience who he hope will
> include and consume limited amounts of provenance information in their
> applications.
>
> Maybe it's only me, and the rest of you aren't hearing this kind of comment.
> But if you are I think that, as we go through the last call process, it is
> appropriate to reflect and consider if what we are producing is really
> relevant to the wider community we aim to serve.  Have we become too bound
> up with fine distinctions that don't matter, or don't apply in the same way,
> to the majority of potential provenance-generating and provenance-using
> applications?   Have we sacrificed approachability and simplicity that
> encourages widespread take-up on the altar of premature optimization to
> support particular usage scenarios?
>
> While I think these are relevant questions, I'm not sure if and what we
> might do about them.  But I also fear that what we produce may turn out to
> be irrelevant in the long run.
>
> #g
> --
>



-- 
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester
Received on Thursday, 25 October 2012 12:01:49 UTC