W3C home > Mailing lists > Public > public-prov-wg@w3.org > June 2012

PROV-ISSUE-403 (Feedback_TL): Feedback on the mapping from Tim Lebo [Mapping PROV-O to Dublin Core]

From: Provenance Working Group Issue Tracker <sysbot+tracker@w3.org>
Date: Sat, 09 Jun 2012 19:10:21 +0000
Message-Id: <E1SdR3F-0003v4-0N@tibor.w3.org>
To: public-prov-wg@w3.org
PROV-ISSUE-403 (Feedback_TL): Feedback on the mapping from Tim Lebo [Mapping PROV-O to Dublin Core]

http://www.w3.org/2011/prov/track/issues/403

Raised by: Daniel Garijo
On product: Mapping PROV-O to Dublin Core

Regarding https://github.com/dcmi/DC-PROV-Mapping/wiki/Mapping-Primer#wiki-References

1)
"To be more precise, we define provenance metadata as metadata providing provenance information according to the definition of the W3C Provenance Incubator Group"

Why are you still using the XG's definition? Does PROV-WG still not provide one that you like? Should PROV-WG be explicit about their definition of provenance (since its materials will become Recommendation and XG's will not)?


2)

"For the complex mappings, we take the following approach: "

is confusing. Is one of the "three parts" enumerated above "complex". Ah, yes. The third.

Suggest to draw that connection more clearly.

3)

The points in the second half of the paragraph:

". A rationale for these two steps is that the mappings in stage 1 are context free and do not depend on the existence of any other statements. On the other hand, by employing the patterns developed for stage 2, any kind of generated PROV data could be cleaned up at a later point, for instance after the integration with provenance information from a different source, which could be advantageous. "

really should be promoted to the first half of the paragraph. It takes too long to determine what the distinction is between the two phases.

4)

The use of blank nodes is disturbing (http://linkeddatabook.com/editions/1.0/#htoc16). Please make it clear that the bnodes only exist during the processing that you suggest, and that bnodes are not produced in resulting PROV or DC records.

5)

Direct mappings:

 -1 dct:references rdfs:subPropertyOf prov:wasDerivedFrom .
 +1 dct:creator rdfs:subPropertyOf prov:wasAttributedTo .
 +1 dct:rightsHolder rdfs:subPropertyOf prov:wasAttributedTo .
 -1 (casting a broad to a specific) dct:date rdfs:subPropertyOf prov:generatedAtTime .
 +1 dct:Agent owl:equivalentClass prov:Agent .
 -1 (reverse these) prov:hadOriginalSource rdfs:subPropertyOf dct:source .
 +1 prov:wasRevisionOf rdfs:subPropertyOf dct:isVersionOf .

Voting for all of them (in https://github.com/dcmi/DC-PROV-Mapping/wiki/Direct-Mappings):

 +1 dct:Agent           owl:equivalentClass   prov:Agent.
 -1 dct:references      rdfs:subPropertyOf    prov:wasDerivedFrom .

 +1 dct:rightsHolder    rdfs:subPropertyOf    prov:wasAttributedTo .
 +1 dct:creator         rdfs:subPropertyOf    prov:wasAttributedTo .
 +1 dct:publisher       rdfs:subPropertyOf    prov:wasAttributedTo .
 +1 dct:contributor     rdfs:subPropertyOf    prov:wasAttributedTo .

 +1 dct:isVersionOf     rdfs:subPropertyOf    prov:wasDerivedFrom .
 +1 dct:isFormatOf      rdfs:subPropertyOf    prov:alternateOf .
 +1 dct:replaces        rdfs:subPropertyOf    prov:tracedTo .
 +1 dct:source          rdfs:subPropertyOf    prov:wasDerivedFrom .

 -1 dct:date            rdfs:subPropertyOf    prov:generatedAtTime .

I would support reversing the above. As it is, you are casting a general "any date you wish" into a very specific meaning.

At first glance, the following are concerning. If the same instance has all of these properties, then it was generated at many distinct times. Perhaps your complex mappings tease this out.

 -1 dct:issued          rdfs:subPropertyOf    prov:generatedAtTime .
 -1 dct:dateAccepted    rdfs:subPropertyOf    prov:generatedAtTime .
 -1 dct:dateCopyRighted rdfs:subPropertyOf    prov:generatedAtTime .
 -1 dct:dateSubmitted   rdfs:subPropertyOf    prov:generatedAtTime .
 -1 dct:modified        rdfs:subPropertyOf    prov:generatedAtTime .

The following casts a range into an instant of time.

 -1 dct:valid           rdfs:subPropertyOf    prov:generatedAtTime .

 -1 prov:hadOriginalSource rdfs:subPropertyOf dct:source .

I would support reversing the above. PROV is pointing to a subset of the sources that dct:source intends to cite. dct:source is the union of hadOriginalSource and any of its derivations (and more, perhaps).

 +1 prov:wasRevisionOf     rdfs:subPropertyOf dct:isVersionOf .


6)

In https://github.com/dcmi/DC-PROV-Mapping/wiki/Mapping-Primer

For readability, I'd reverse the order of these:

 dcprov:CreationActivity rdfs:subClassOf
   prov:Activity, dcprov:ContributionActivity .
 dcprov:ContributionActivity rdfs:subClassOf
   prov:Activity .

7)

In https://github.com/dcmi/DC-PROV-Mapping/wiki/Mapping-Primer

For readability, I'd reverse the order of these:

 dcprov:CreatorRole rdfs:subClassOf
   prov:Role, dcprov:ContributorRole .
 dcprov:ContributorRole rdfs:subClassOf
   prov:Role .

8)

If we reapply the SPARQL queries from the complex mappings twice, do we get two un-identified blank nodes that should be identified?
If so, this leads to proliferation of bnodes that should be avoided. If the queries are only to be informative, and those bnodes to be appropriately named to avoid duplication, then I suggest this be clearly stated.

9)

In https://github.com/dcmi/DC-PROV-Mapping/wiki/Complex-Mappings-S1 section "List of dc terms excluded from the mapping",
I suggest to organize by descriptive vs. provenance metadata. That way I can review your categorization more easily, AND focus on only the provenance metadata (which is the point of the mapping).

10)

In https://github.com/dcmi/DC-PROV-Mapping/wiki/Mapping-Primer

No bibliography for (DCMI Usage Board, 2010b) or (DCMI Usage Board, 2010a)

You don't reference the URL http://dublincore.org/documents/dcmi-terms/ ?

11)

It seems like you could include the content of https://github.com/dcmi/DC-PROV-Mapping/wiki/Direct-Mappings and https://github.com/dcmi/DC-PROV-Mapping/wiki/Prov-Specializations directly in the "primer" - the redundancy is dissonant.

Why three complex mappings in the primer? Why now fewer?

The organization across 4 pages makes it difficult to determine "what is where". I think the content as it is could stand on its own as one document.

12)

Where is stage 2 of the complex mappings?


13) Are there implementations of your complex mapping?



14)

https://github.com/dcmi/DC-PROV-Mapping/wiki/Prov-Specializations

The following order makes more sense to me

 dcprov:PublicationActivity      rdfs:subClassOf     prov:Activity .
 dcprov:ContributionActivity     rdfs:subClassOf     prov:Activity .
 dcprov:CreationActivity         rdfs:subClassOf     prov:Activity, dcprov:ContributionActivity .
 dcprov:ContributorRole          rdfs:subClassOf     prov:Role .
 dcprov:PublisherRole            rdfs:subClassOf     prov:Role .
 dcprov:CreatorRole              rdfs:subClassOf     prov:Role, dcprov:ContributorRole .



15)

https://github.com/dcmi/DC-PROV-Mapping/wiki/Prov-Specializations

Are the following used in the complex rules? It would be very nice to show which rules each specialization is used in. Similarly, it would be nice to group rules by their use of PROV terms, and by "in the where" versus "in the construct". A navigation like this would really bring the material together nicely.

 dcprov:PublicationActivity      rdfs:subClassOf     prov:Activity .
 dcprov:ContributionActivity     rdfs:subClassOf     prov:Activity .
 dcprov:CreationActivity         rdfs:subClassOf     prov:Activity, dcprov:ContributionActivity .
 dcprov:ContributorRole          rdfs:subClassOf     prov:Role .
 dcprov:PublisherRole            rdfs:subClassOf     prov:Role .
 dcprov:CreatorRole              rdfs:subClassOf     prov:Role, dcprov:ContributorRole .


16)

Is the following a copy paste error (publisher is never mentioned):

https://github.com/dcmi/DC-PROV-Mapping/wiki/Complex-Mappings-S1

Section: dct:publisher

 CONSTRUCT {
   ?doc a prov:Entity .
      prov:wasAttributedTo ?ag .
   _:out a prov:Entity .
      prov:specializationOf ?doc .
   ?ag a prov:Agent .
   _:act a prov:Activity, dcprov:PublicationActivity ;
      prov:wasAssociatedWith ?ag ;
      prov:qualifiedAssociation _:assoc .
   _:assoc a prov:Association ;
      prov:agent ?ag ;
      prov:hadRole dcprov:PublisherRole .
   _:out prov:wasGeneratedBy _:act ;
      prov:wasAttributedTo ?ag .
 } WHERE {
   ?doc dct:creator ?ag .
 }



17)

https://github.com/dcmi/DC-PROV-Mapping/wiki/Complex-Mappings-S1

spacing is off in:


 dct:rightsHolder

 The rightsHolder is different, here we propose to omit the activity and just add the rights holder to the entity by means of
 prov:wasAttributedTo. This mapping could actually be omitted as the statements can be inferred from the direct mapping.

 CONSTRUCT {
 ?doc     a                         prov:Entity .
 ?ag       a                         prov:Agent .
 ?doc     prov:wasAttributedTo      ?ag .
 } WHERE {
 ?doc dct:rightsHolder?ag .
 }


18)

https://github.com/dcmi/DC-PROV-Mapping/wiki/Complex-Mappings-S1

Recommend expanding variable names to be more readable (e.g., ?ag to ?agent)

19)

https://github.com/dcmi/DC-PROV-Mapping/wiki/Complex-Mappings-S1

Is there a reason why you use "_:iss_entity" instead of just the "[]" syntax? smearing a node across the CONSTRUCT makes it more difficult to read. You used the "[]" in :


dct:modified

 [ a prov:Generation ;
                                                prov:atTime ?date  ;
                                                prov:activity _:act . ] .
Received on Saturday, 9 June 2012 19:10:29 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:58:16 UTC