PROV-ISSUE-663 (Antoine-FB-DC): Antoine Isaac's Feedback on PROV-DC [Mapping PROV-O to Dublin Core]

PROV-ISSUE-663 (Antoine-FB-DC): Antoine Isaac's Feedback on PROV-DC [Mapping PROV-O to Dublin Core]

http://www.w3.org/2011/prov/track/issues/663

Raised by: Daniel Garijo
On product: Mapping PROV-O to Dublin Core

In fact I did not look very thoroughly into the mappings of section 3.1. Time allowing I may send another email later. The mappings look appropriate at first sight, though. Most of my comments (listed below) are editorial, though some may touch on conceptual issues in the arguments exposed in the text.

The only real problems now could be with roles, e.g., prov:Creator, prov:Publisher. In section 3.2 they are introduced as classes but in section 3.3 they are used as instances of classes. And in section 3.4 it is mixed: the Turtle example has an instance but Fig. 3 has prov:Creator as a class with an instance which is not mentioned in the Turtle example. What is your choice? How does PROV handles roles?


A last general note/disclaimer, I have to say that I will not apply the mappings soon myself, especially not the complex ones (with a 1:15 multiplier ratio between the input triples and the output triples in section 3.3, the clean-up in section 3.4 is a much welcome suggestion!). To some extent I am reading the document now not because I plan to push implementation of all its receipes in Europeana of elsewhere, but because it is a good introduction to PROV for a more traditional metadata community. And this is far from a little achievement. Well done!

Best

Antoine

=====


- Abstract: please spell out the URI of the "here" hyperlinks. Or create a specific paragraph in the intro that does it, and point to this paragraph from the abstract.

- Status of document: remove "(to be published as X)" or "(Proposed recommendation)" from the listing of PROV documents. In fact I'd suggest just to make a reference to PROV-OVERVIEW and do a much welcome shortening of the section.

- ToC - Structure of the document: the document misses an "Appendices" section to wrap A and B together apart from the other (numbered) sections.

- ToC - Structure of the document: I see no reason why there is a B1. The notion of "informative references" is maybe not very useful in a note such as yours!

- Use of "Dublin Core", "DC", "dc": a couple of occurrence of "Dublin Core" occur after you've started using the abbreviation in a systematic way. Homogenization would be good! Also, there's (at least) one "dc" in courrier font in 2.2.


========= Section 2.1

- The word "affected" in the first paragraph (E.g., it hints that a resource can be "affected in the past") does not mean much to me, as a non-native speaker. .

- the paragraph on "Descriptive terms" mentions 30 terms for that category. Table 2 has 29.

- Perhaps a similar issue as above, for "derivation". The elements for rights, which are often related to access and consumption, seem to have a broader scope than what I understand to be "derivation". It's as if you are trying to shoehorn rights into this category. I'm not convinced, and I can't see much benefit in trying this anyway. There could just be an extra category. As a matter of fact I would find this in line with the fact that all rights-related properties have naturally found their place in Table 6 of the rejected properties. For some it is so obvious that you have (rightly) not written a reason for rejecting them!

- Table 2. My printout did not print the expected "What" in the first line (it could be a bug on my side).


========= Section 2.2

- example 1: I'd recommend using more meaningful URIs for the document versions, e.g. ex:prov-dc-20130312 and ex:prov-dc-20121211.

- "relates to the different states that the document had". My gut reading of this sentence was that it was about versions only, which is too restrictive (there's more at stake than logical versions of a doc) and unconvincing (if the aim was to capture versions only, dct:replaces would be quite enough). Perhaps replace by "relates to the different stages the document underwent" or something grammatically correct than this.

- "involves two different states of the document: the document before it was issued and the issued document". To many readers in the DC community, there will be just one document before and after issuing; it does not really change. Perhaps removing "document" from the second part ("states of the document: /before/ and /after/ publication) will help not discouraging them. This can also make the sentence more coherent (the object is "states" in the first part and "the document" in the second).
It looks like nitpicking, but I fear there's a real risk of losing a part of your core audience here.

- Figure 1: if the graph convention used is the one used throughout all PROV documents, it may be useful to mention. It looks very ad-hoc, otherwise.

- Approach 2: I don't buy the argument that the pattern "implies that ex:doc1 was generated by _:activity and then used by _:activity afterwards". Is there some specific semantics to activities' properties, which I'm missing?
As I understand it, Approach 1 does not imply that _:resulting_entity was generated by _:activity and then _:used_entity was used by _:activity afterwards", which is the exact transposition of your interpretation in Approach 2.

- Fig.2 whether I'm right or wrong on the above issue, you can remove "(as it implies[...]activity)" from the caption. It doesn't really belong there.

- I thought (from the previous version of the PROV-DC document) that the most important argument against Approach 2 was that PROV discouraged a same resource to be used as the input and the output of an activity at a same time. Has it changed? Personally I didn't like that PROV rule, but in the context of a DC-PROV mapping this was a very powerful argument...


========== Section 3.1

- first paragraph move "(i.e. they will be able to understand DC statements)" just after "to interoperate with these DC statements". the bracketed sentence doesn't really explain "reasoning" per se, it rather tries to explain interoperability.
And is "by applying means of OWL 2" really grammatical a construction?

- Table 3: finding dct:Agent here comes a bit as a surprise, as the class has not been introduced before (e.g. in Table 2). Perhaps it could be presented aside.

- Table 3 is really big and has a lot of white space. Maybe removing the namespace prefixes (which do not bear much info anyway, given what the columns include) would allow to trim the first three columns.

- Please keep in Table 3 the order defined in Table 3! The current mismatch makes comparison difficult, and for no real reason it seems.

- "This is valid since from the PROV point of view" and the rest of the paragraph should be tightened. In the RDF graph that results from example 1, there is a prov:Entity with two prov:generatedAtTime statements. Is it valid or not? The paragraph currently hint both (it is valid, but does not comply to PROV constraints), which is confusing.

- Table 5 has a confusing introduction: what is its rationale as a separate table? The fact that it's mapping to inverse relationships, or the fact that it's mapping to outside the core of PROV?


========== Section 3.2

My personal taste would be to remove the somewhat redundant prov:Activity and prov:Role from the rdfs:subclassesOf prov:Create and prov:Creator.

You could replace "refinements of the properties have been omitted" by "refinements of the properties are not needed". The latter is stronger, and still true!


========== Section 3.3

I don't understand why replacement is presented as the result of a "search and replace". There's no "search" implied as default in a dct:replaces link, isn't it?


========== Section 3.4

The notion of "complement" is unclear. Rather than "certain properties complement each other" couldn't we have "certain properties indicate a same activity"?
I am not sure also that dct:modified and dct:contributor are so connected. A contributor can be involved in the creation of the document, I believe.


========== Section 3.5

Table 6: it is confusing to find here the elements that Table 1 lists as relevant for provenance (who when how) and the descriptive metadata elements. The table would benefit from the descriptive ones to be removed, especially the one for which it is absolutely no surprise that they shouldn't be mapped. Or at least the categories should be separated in different tables...
Splitting the tables (in all 4 categories, in fact) would also allow to get rid of the second column, which consumes a lot of space for pretty much nothing.
It would also help comparisons. Table 2 has 29 "descriptive metadata element", Table 6 has 28. With the order being different, and the size so big, I won't make the effort to know which element has been left out.

dct:isRequiredBy line has a type ("reosource")

########################################################################

Received on Tuesday, 16 April 2013 20:39:41 UTC