- From: Graham Klyne <GK@ninebynine.org>
- Date: Thu, 25 Aug 2011 18:03:44 +0100
- To: W3C provenance WG <public-prov-wg@w3.org>
With reference to:
http://dvcs.w3.org/hg/prov/raw-file/71fa5079e6b3/model/ProvenanceModel.html
(Note this is the Mercurial version at 25-Aug-2011, about 16:30UTC)
...
This is my second attempt to read this document, and while I have slightly
better comprehension than first time round, I'm still finding it very hard
going. Rather than go through many small points, I'm going to try to focus on
high-level issues.
I am really concerned that the direction of development of this specification
will make it unusable for (what I take to be) its intended purpose, which is to
provide a reference for developers who are generating and consuming provenance
information. For this, the what is needed first and foremost is a simple data
model with a straightforward mapping to one or more common web representation.
From the WG charter, I expect RDF to be one of those. If this is not provided,
then I fear the specification will render itself irrelevant.
I find the example in section 3, and the examples associated with the individual
definitions, are really unhelpful. In particular, the "Abstract Syntax
Notation" is used without any explanation of what it means. (There is an
appendix with a BNF syntax for ASN, but no explanation of how to read or
interpret it.)
What I think would be *far* more useful than section 3 in its current form is a
"50,000-foot" view of the model, outlining the main classes and their possible
relationships. A very nice example of such a view for a simpler provenance
ontology (OPMV) can be seen in
http://open-biomed.sourceforge.net/opmv/ns.html#sec-desc. Coupled with a 1-2
line summary of each class and relation, this would go a long way to making the
whole framework more approachable.
In section 3.3, the diagram key shows what the different line styles represent,
but not the different shapes.
...
5.1 Entity
In sections 4 and 5.1, I think the definition of Entity, and its distinction as
a "characterized thing" is unhelpful and confusing. It is not clear to me what
the distinction of being "characterized" means in any formal sense. There has
been a lot of discussion on the mailing list about this, but I haven't see a
single argument that shows why it is *needed* to distinguish an "Entity" in any
way from anything that can be identified; i.e. any web resource.
It seems to me that the primitive provenance assertions one might make about an
"Entity" (e.g. dc:creator, doap:release, to use examples from OPMV) can be made
regardless of any claim that the Entity is a "characterized thing". Similarly,
I see no breakage arising from appropriate application of provenance relations
(such as derivedFrom, generatedBy, etc.) to arbitrary entities. Using RDF, the
normal approach would be to infer from the existence of such relations some
information about the type of the things they relate. If they are used
inappropriately, the resulting expression may disagree with reality, or be
unsatisfiable - we can't stop people from making nonsense statements on the web.
....
5.8 isComplementOf
I struggle to understand from the text what practical application there is for
the isComplementOf relation. From mailing list discussion, I come to an
understanding that it might be useful for talking about different provenance
accounts, to understand when they are referring to the same underlying thing
("Royal Society", etc.)
But I find the description given is somewhat complicated and confusing, in part
because of the enforced distinction between things and "Entities". I find it
much more natural to think of views (i.e. where I think we started out with
"IVPof") where different accounts may use different views (roughly corresponding
to different observational constraints) of some thing, which can themselves be
seen as things (e.g. in the examples given M1, M2, etc might be things
corresponding to "Royal Society" in the period(s) when its membership was as
indicated in each case, or L1, L2, etc, corresponding to it being located at
specific places. The "established on" view could be considered as the
underlying "Royal Society" itself, as that never changes.
From this notion of "view", the "complementOf" relation is easily derived: if
two things are both views of some underlying thing, then they are complementOf
each other.
This all seems much simpler, more intuitive and less complicated to me than the
contortions used to explain complementOf in terms of "characterized entities"
and attributes.
....
5.16 Provenance container
Leaving aside, for now, the notion of accounts, I find this concept completely
unnecessary. Further, it is defined as having a set of "provenance constructs",
but I see no defined concept for these, so the definition given is incomplete.
I think it would be easier to have a "Provenance expression" (roughly
corresponding to an RDF expression) that is a provenance assertion about some
thing or things, which can be evaluated to be true or false. At this level, I
see no need for having any kind of visibility of the inside of such an
expression looks like. The main requirement to be a valid provenance expression
is that it is not dependent on any ways in which the referenced thing or things
may vary (i.e. talks only about invariant aspects of the thing(s)).
I accept some form of containment is needed for accounts, as it is important for
some purposes to be able to consider them separately - but I can't see why that
containment isn't just part of what it is to be an account.
My message to Daniel
(http://lists.w3.org/Archives/Public/public-prov-wg/2011Aug/0242.html) expands a
little on how I see this.
....
In summary: I've tried to focus in high-level issues that I think make this
provenance model document difficult to understand, hence difficult to use in
discussion of other areas of the provenance WG discussion, and which I also fear
may have the effect of it being ignored by developers in favour of simpler
specifications like OPMV:
- I think a short, high-level overview is needed to illustrate how the various
ideas work together
- I think that too much formality is invoked too early on in the document, and
that the formalism used is inadequately described. Also, I note that the
formalism is applied to the examples, not to the definitions themselves, which
seems a little odd to me.
- I think the definition of "Entity" is unnecessarily complex, and has been
giving rise to much confusion in working group discussions.
- I think the definition of "Entity" gives rise to an over-complicated
definition of "complementOf". I think the original notion of "IPVof" was more
useful, and got lost along the way.
- I think the notion of provenance containers as a separate concept is not
needed, and that a simpler concept of provenance expression (or assertion),
which is just another "thing", could be all that is needed. Accounts could then
be implicit containers for provenance expressions.
#g
--
Received on Thursday, 25 August 2011 17:05:25 UTC