Re: prov-xml review for release as a FPWD

On Nov 2, 2012, at 9:22 AM, James Cheney <jcheney@inf.ed.ac.uk> wrote:

> Hi,
> 
> I've had a quick look over the draft.  I haven't had time to check all of the details against prov-dm, so apologies if some comments are off-target; I thought it might be helpful to give a rapid response since I'm traveling from Tuesday next week.
> 
>> - can the document be released as a fpwd
> 
> I think so
> 
>> - if not, what are the blocking issues?
> 
> N/A
> 
>> - are they other issues to address.
> 
> 
> Here are some comments / questions that I think are worth discussing but probably not serious enough to block release.
> 
> - "PROV-DM-CONSTRAINTS" is now called "PROV-CONSTRAINTS".  Some of the boilerplate under "Status of this document" and in sec. 1 seems out of date.

Noted.  I will make this change and update section 1.

> 
> - I see that id's are attributes, but most other parameters (and all attributes) to the various prov-dm relations are represented as elements.  I would find it more natural to use attributes for the ids and all of the positional parameters, since they are always flat data (to my knowledge).  For example:
> 
> <prov:wasGeneratedBy prov:entity="e1" prov:activity="a1" prov:time="...">
>  <ex:port>p1</ex:port>
> </prov:wasGeneratedBy>
> 
> I realize this is a somewhat arbitrary decision, but is there a reason for using elements for these I'm not seeing?

This is a good question I will try to address it though I do not think the PROV-XML group has defined a criteria for attribute vs element that we can refer to.

The PROV-DM attributes prov:label, prov:type, prov:role, and prov:location may occur more than once, so they must be XML elements.

The PROV-DM attributes prov:value, prov:location, prov:role, and prov:type have as their defined range a PROV-DM Value (A value ◊ is a constant such as a string, number, time, qualified name, IRI, and encoded binary data).  Because the value may be binary encoded data we decided any attribute with range of PROV-DM Value must be an XML element rather than an XML attribute.

That explains the attributes but not the other relation components such prov:entity, prov:activity, prov:time, etc.

For these the desired design is less obvious and we made our decision for two reasons

1) The OPM XML schema we based our initial PROV-XML schema off followed a similar pattern
2) if these components are thought of as essential material / information of the relation then we did not want to put that information into attributes

I am not an experienced XML Schema designer, but many of the recommended conventions for attribute vs elements I have viewed suggest not putting core information/content about a record into attributes.

Below are two relevant recommendations from http://www.ibm.com/developerworks/xml/library/x-eleatt/index.html

"If you consider the information in question to be part of the essential material that is being expressed or communicated in the XML, put it in an element."

"If you consider the information to be peripheral or incidental to the main communication, or purely intended to help applications process the main communication, use attributes."

I think prov:id and prov:ref meet this test for attributes, but prov:entity,prov:activity, prov:time, etc. are essential information communicated in the relation and therefore fit as an element if we follow this suggestion.

> 
> - I think it might be helpful to give a paragraph or two sketching the design, and perhaps justifying certain design choices (e.g. use of elements vs. attributes, and the need for PROV-DM "attributes"/key-value pairs to be represented as elements here, since there can be multiple copies of the same key with different values).

Would the above explanation, with some clean-up and elaboration, satisfy this need?

> 
> - It isn't clear to me whether there are uniqueness or existence restrictions on the ids.  There don't seem to be any such restrictions, which seems fine.  But in the example of "mentionOf", it is strange that ex:run2 is declared as a bundle, but mention requires the bundle parameter to be an entity reference.

We currently do not have a complexType for a bundle, a bundle is an entity with the prov:type "prov"Bundle".  If this is weird we could add a prov:Bundle complexType that extends prov:Entity.

>  If there are no further constraints (i.e. the different types of references are structurally the same but have different names for documentation purposes) this is probably fine.  
> 
> - I haven't checked each relation to ensure it's aligned with PROV-DM, but expect this has been done already.  Are the element/attribute parameter names aligned with those used in prov-o?  They mostly seem to be, but this is also worth checking.

It is intended to be aligned with PROV-DM.  There is more flexibility in PROV-XML in hadMember then there is in the PROV-DM since multiple entities may be referenced in a single hadMember relation.

We should definitely check the alignment to make sure they are still in sync.  I am not sure about the alignment with PROV-O at the moment, but if both are mapped to the PROV-DM then an alignment between PROV-XML and PROV-O seems very feasible.

> 
> - In the "usage" example, a ">" is garbled maybe as &ltgt;

fixed.

> 
> - In 2.1.7, missing comma after "ender"

That appears to be a typo in the PROV-DM Term glossary and as such is also present in the PROV-DM HTML.  The PROV-XML Note pulls the glossary definitions from a shared source glossary so if we update that it will fix the typo in both.

> 
> - In 2.3, "second component" should be "third component"

fixed.

> 
> - In 2.3.4,  "</prov:wasAssociatedWithv" - v should be ">"

fixed.

> 
> - While this schema provides a natural "flat" way to represent PROV-N data, I am curious if there are natural ways to leverage XML's nested structure (much as XML schema allows writing types in a "nested" way which can also be flattened out using type names).  Other than possibly allowing entity statements to be embedded inside collections, it's not clear to me how this would work, so no action seems necessary, it's just an idle question.
> 
> - In most/all types carrying an id, the id attribute is declared last.  Is there a reason it is not first (since it's an attribute, its position in the type doesn't matter, but it would be nice for all of the attributes to be in the same order as in PROV-DM).

No reason that I am aware of.

> 
> - The various time elements are required to be xsd:dateTime type, which I'm not sure is required in PROV-DM (but haven't checked).

PROV-DM Section 5.7.3. Value states:

"We note that PROV-DM time instants ◊ are defined according to xsd:dateTime [XMLSCHEMA11-2]."

> 
> - many of the types have the same allowed attributes; these could probably be factored out as named complex types, to avoid duplication.

Could you provide an example?

> 
> -The group "documentElements" might be named "instance", which is the term used for a set of statements in prov-constraints (but maybe documentElements is clearer)

I will ask the PROV-XML group for feedback/thoughts on this suggestion.

Thanks,
--Stephan

> 
> 
> --James
> 
> 
> On Nov 1, 2012, at 4:39 PM, Luc Moreau wrote:
> 
>> 
>> 
>> Dear all,
>> 
>> We need some volunteer to review the prov-xml document [1].
>> 
>> James and I have volunteered to review the document. It would be nice
>> to have other reviewers.  Please respond to this email.
>> 
>> The questions to reviewers are:
>> - can the document be released as a fpwd
>> - if not, what are the blocking issues?
>> - are they other issues to address.
>> 
>> The deadline is Thursday Nov 8th, ahead of the face to face meeting,
>> so that the group can decide whether the document can be released or not.
>> 
>> Regards,
>> Luc
>> 
>> 
>> [1] http://dvcs.w3.org/hg/prov/raw-file/default/xml/prov-xml.html
>> 
>> -- 
>> Professor Luc Moreau
>> Electronics and Computer Science   tel:   +44 23 8059 4487
>> University of Southampton          fax:   +44 23 8059 2865
>> Southampton SO17 1BJ               email: l.moreau@ecs.soton.ac.uk
>> United Kingdom                     http://www.ecs.soton.ac.uk/~lavm
>> 
>> 
>> 
> 
> 
> -- 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> 
> 
> 

Received on Thursday, 8 November 2012 23:14:02 UTC