Re: prov-xml review for release as a FPWD

On Nov 5, 2012, at 7:52 PM, Paul Groth <pgroth@gmail.com> wrote:

> Hello,
> 
> I have reviewed prov-xml ( http://dvcs.w3.org/hg/prov/raw-file/default/xml/prov-xml.html_
> 
> The document can be released as a first public working draft. Here are my detailed comments below many of which I now realize echo some of James' comments. The key issues being: leveraging xsd schema and explaining design decisions.
> 
> Regards
> Paul
> 
> ---Detailed Comments--
> 
> ==Abstract==
> 
> The abstract could be shorter. Suggested revision:
> 
> Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness. PROV-DM is the conceptual data model that forms a basis for the W3C provenance (PROV) family of specifications. It defines a concepts for expressing provenance information enabling interchange. This document introduces an XML schema for the PROV data model (PROV-DM), allowing instances of the PROV data model to be serialized in XML.

I updated the abstract based on your suggestion.

> 
> ==Introduction==
> I like the focused nature of the document, not lots of justification around design choices, etc.. However, this should be clearly stated in the introduction. I would add a sentence something like: "This specification goal is to provide a succinct definition of the XML form of PROV-DM, thus, we refer the reader to the PROV-DM to provide overall justification and context to the definitions presented here."

added.

> 
> Also, I would link out to each of the concepts in the DM when they are presented within the document.	

I'll go through and do this shortly.

> 
> ==2.1.1 Entity==
> In the example you have ex:version which I think may be confusing because we have revision in PROV. 

Understandable.  The PROV-XML group developed our examples based on the examples in the PROV-DM document.  This example was based on example 17 (http://dvcs.w3.org/hg/prov/raw-file/default/model/prov-dm.html#term-entity).

> 
> ==Use of prov:type for terms within the dataset==
> 
> For all subtypes defined in prov the spec defines that one should use the prov:type construct. e.g. <prov:wasDerivedFrom> <prov:type>prov:Revision</prov:type></prov:wasDerivedFrom>. I was wondering what the rationale for that choice is. Why doesn't one see <prov:wasRevisionOf>? 

There is no wasRevisionOf because PROV-DM does not define a wasRevisionOf relation.

http://dvcs.w3.org/hg/prov/raw-file/default/model/prov-dm.html#term-revision

Same pattern with Quotation and PrimarySource.

> 
> Clearly this is a pattern used throughout the document. I think this pattern deserves a small paragraph explaining why the approach was taken. This is especially true as XML Schema supports the definition of subtypes through xsd:extension

I like the idea, but it would appear to introduce relations that are not defined in PROV-DM.

We could test it out, see what the group thinks.

> 
> ==Other Patterns==
> I think there are a couple of other patterns used within the schema design. Maybe adding a section on those patterns would help the reader more easily understand the approach. The patterns I see are:
> 1) use xml ids and refs to express the provenance graph (2) in type definitions required provenance elements are presented first, then optional provenance elements, then application specific elements
> 3) prov:attributes are interpreted as extra non-provenance elements within complex types (e.g.  <xs:any namespace="##other"/>). I assume this is why specialization and alternate do not have extensibility points.
> 4) can you define the "salami slice XSD design pattern" in the text?
> 
> ==prov:id==
> I was a bit confused by prov:id. Can you give some examples of what can go in prov:id? It's defined as a QName so I can't put a full url in? Your example of prov:id (prov:id="tr:WD-prov-dm-20111215) uses tr: which is not defined in the namespace. Is this just a mistake? It would be good to see an an example linking out beyond the scope of one document.

I believe Qname was chosen because it is generally used to reference particular elements within an XML document.  Perhaps with the requirement to link beyond the scope of one document we should consider xlinks for references and uris for prov:id instead?

the lack of the tr namespace is a mistake which I will correct.

As for linking out beyond the scope of one document, I will put that question forward to the PROV-XML group.

--Stephan

> 
> 
> 
> 
> 
> 
> 
> 

Received on Thursday, 8 November 2012 23:39:06 UTC