Re: prov-xml review for release as a FPWD from Paul Groth on 2012-11-06 (public-prov-wg@w3.org from November 2012)

From: Paul Groth <pgroth@gmail.com>
Date: Mon, 5 Nov 2012 21:52:14 -0500
To: Luc Moreau <l.moreau@ecs.soton.ac.uk>
Cc: "public-prov-wg@w3.org" <public-prov-wg@w3.org>
Message-ID: <CAJCyKRoybQU2U8uQ6S3ND1FCmYwV7V34z4szOjQoEbXC0M0JVg@mail.gmail.com>

Hello,

I have reviewed prov-xml (
http://dvcs.w3.org/hg/prov/raw-file/default/xml/prov-xml.html_

The document can be released as a first public working draft. Here are my
detailed comments below many of which I now realize echo some of James'
comments. The key issues being: leveraging xsd schema and explaining design
decisions.

Regards
Paul

---Detailed Comments--

==Abstract==

The abstract could be shorter. Suggested revision:

Provenance is information about entities, activities, and people involved
in producing a piece of data or thing, which can be used to form
assessments about its quality, reliability or trustworthiness. PROV-DM is
the conceptual data model that forms a basis for the W3C provenance (PROV)
family of specifications. It defines a concepts for expressing provenance
information enabling interchange. This document introduces an XML schema
for the PROV data model (PROV-DM), allowing instances of the PROV data
model to be serialized in XML.

==Introduction==
I like the focused nature of the document, not lots of justification around
design choices, etc.. However, this should be clearly stated in the
introduction. I would add a sentence something like: "This specification
goal is to provide a succinct definition of the XML form of PROV-DM, thus,
we refer the reader to the PROV-DM to provide overall justification and
context to the definitions presented here."

Also, I would link out to each of the concepts in the DM when they are
presented within the document.

==2.1.1 Entity==
In the example you have ex:version which I think may be confusing because
we have revision in PROV.

==Use of prov:type for terms within the dataset==

For all subtypes defined in prov the spec defines that one should use the
prov:type construct. e.g. <prov:wasDerivedFrom>
<prov:type>prov:Revision</prov:type></prov:wasDerivedFrom>. I was wondering
what the rationale for that choice is. Why doesn't one see
<prov:wasRevisionOf>?

Clearly this is a pattern used throughout the document. I think this
pattern deserves a small paragraph explaining why the approach was taken.
This is especially true as XML Schema supports the definition of subtypes
through xsd:extension

==Other Patterns==
I think there are a couple of other patterns used within the schema design.
Maybe adding a section on those patterns would help the reader more easily
understand the approach. The patterns I see are:
1) use xml ids and refs to express the provenance graph (2) in type
definitions required provenance elements are presented first, then optional
provenance elements, then application specific elements
3) prov:attributes are interpreted as extra non-provenance elements within
complex types (e.g.  <xs:any namespace="##other"/>). I assume this is why
specialization and alternate do not have extensibility points.
4) can you define the "salami slice XSD design pattern" in the text?

==prov:id==
I was a bit confused by prov:id. Can you give some examples of what can go
in prov:id? It's defined as a QName so I can't put a full url in? Your
example of prov:id (prov:id="tr:WD-prov-dm-20111215) uses tr: which is not
defined in the namespace. Is this just a mistake? It would be good to see
an an example linking out beyond the scope of one document.

Received on Tuesday, 6 November 2012 02:52:42 UTC