Re: Dublin Core - PROV Mapping, Call for Feedback (until June 5th) from Timothy Lebo on 2012-06-03 (public-prov-wg@w3.org from June 2012)

From: Timothy Lebo <lebot@rpi.edu>
Date: Sun, 3 Jun 2012 12:46:02 -0400
To: Kai Eckert <kai@informatik.uni-mannheim.de>
Cc: public-prov-wg@w3.org, DC-PROVENANCE@JISCMAIL.AC.UK, DCMI Architecture <dc-architecture@JISCMAIL.AC.UK>
Message-Id: <BB456A62-D517-4C50-9074-29A63FED9D33@rpi.edu>
Kai,

Thanks for coordinating this mapping effort. 

It is clear that some thought has gone into both _how_ to map and _what_ the mappings should be. 
I'm sure others will be following in these similar steps as PROV matures. We'll all certainly benefit from this work.

I'm including a smattering of comments that I jotted down as I worked through your materials.
It's "from the hip", so take it or leave any that do not concern you. 
Some are editorial, others are about the content.
Also, feel free to follow up on any that you would like more clarity or discussion on.

Regards,
Tim Lebo





Regarding https://github.com/dcmi/DC-PROV-Mapping/wiki/Mapping-Primer#wiki-References

1)
"To be more precise, we define provenance metadata as metadata providing provenance information according to the definition of the W3C Provenance Incubator Group"

Why are you still using the XG's definition? Does PROV-WG still not provide one that you like? Should PROV-WG be explicit about their definition of provenance (since its materials will become Recommendation and XG's will not)?


2)

"For the complex mappings, we take the following approach: "

is confusing. Is one of the "three parts" enumerated above "complex". Ah, yes. The third.

Suggest to draw that connection more clearly.

3)

The points in the second half of the paragraph:

". A rationale for these two steps is that the mappings in stage 1 are context free and do not depend on the existence of any other statements. On the other hand, by employing the patterns developed for stage 2, any kind of generated PROV data could be cleaned up at a later point, for instance after the integration with provenance information from a different source, which could be advantageous. " 

really should be promoted to the first half of the paragraph. It takes too long to determine what the distinction is between the two phases.

4)

The use of blank nodes is disturbing (http://linkeddatabook.com/editions/1.0/#htoc16). Please make it clear that the bnodes only exist during the processing that you suggest, and that bnodes are not produced in resulting PROV or DC records.

5)

Direct mappings:

 -1 dct:references rdfs:subPropertyOf prov:wasDerivedFrom .
 +1 dct:creator rdfs:subPropertyOf prov:wasAttributedTo .
 +1 dct:rightsHolder rdfs:subPropertyOf prov:wasAttributedTo .
 -1 (casting a broad to a specific) dct:date rdfs:subPropertyOf prov:generatedAtTime .
 +1 dct:Agent owl:equivalentClass prov:Agent .
 -1 (reverse these) prov:hadOriginalSource rdfs:subPropertyOf dct:source .
 +1 prov:wasRevisionOf rdfs:subPropertyOf dct:isVersionOf .

Voting for all of them (in https://github.com/dcmi/DC-PROV-Mapping/wiki/Direct-Mappings):

 +1 dct:Agent           owl:equivalentClass   prov:Agent.
 -1 dct:references      rdfs:subPropertyOf    prov:wasDerivedFrom .

 +1 dct:rightsHolder    rdfs:subPropertyOf    prov:wasAttributedTo .
 +1 dct:creator         rdfs:subPropertyOf    prov:wasAttributedTo .
 +1 dct:publisher       rdfs:subPropertyOf    prov:wasAttributedTo .
 +1 dct:contributor     rdfs:subPropertyOf    prov:wasAttributedTo .

 +1 dct:isVersionOf     rdfs:subPropertyOf    prov:wasDerivedFrom .
 +1 dct:isFormatOf      rdfs:subPropertyOf    prov:alternateOf .
 +1 dct:replaces        rdfs:subPropertyOf    prov:tracedTo .
 +1 dct:source          rdfs:subPropertyOf    prov:wasDerivedFrom .

 -1 dct:date            rdfs:subPropertyOf    prov:generatedAtTime .

I would support reversing the above. As it is, you are casting a general "any date you wish" into a very specific meaning.

At first glance, the following are concerning. If the same instance has all of these properties, then it was generated at many distinct times. Perhaps your complex mappings tease this out.

 -1 dct:issued          rdfs:subPropertyOf    prov:generatedAtTime .
 -1 dct:dateAccepted    rdfs:subPropertyOf    prov:generatedAtTime .
 -1 dct:dateCopyRighted rdfs:subPropertyOf    prov:generatedAtTime .
 -1 dct:dateSubmitted   rdfs:subPropertyOf    prov:generatedAtTime .
 -1 dct:modified        rdfs:subPropertyOf    prov:generatedAtTime .

The following casts a range into an instant of time.

 -1 dct:valid           rdfs:subPropertyOf    prov:generatedAtTime .

 -1 prov:hadOriginalSource rdfs:subPropertyOf dct:source .

I would support reversing the above. PROV is pointing to a subset of the sources that dct:source intends to cite. dct:source is the union of hadOriginalSource and any of its derivations (and more, perhaps).

 +1 prov:wasRevisionOf     rdfs:subPropertyOf dct:isVersionOf . 


6)

In https://github.com/dcmi/DC-PROV-Mapping/wiki/Mapping-Primer

For readability, I'd reverse the order of these:

 dcprov:CreationActivity rdfs:subClassOf
    prov:Activity, dcprov:ContributionActivity .
 dcprov:ContributionActivity rdfs:subClassOf
    prov:Activity .

7)

In https://github.com/dcmi/DC-PROV-Mapping/wiki/Mapping-Primer

For readability, I'd reverse the order of these:

 dcprov:CreatorRole rdfs:subClassOf
    prov:Role, dcprov:ContributorRole .
 dcprov:ContributorRole rdfs:subClassOf
    prov:Role .

8)

If we reapply the SPARQL queries from the complex mappings twice, do we get two un-identified blank nodes that should be identified?
If so, this leads to proliferation of bnodes that should be avoided. If the queries are only to be informative, and those bnodes to be appropriately named to avoid duplication, then I suggest this be clearly stated.

9)

In https://github.com/dcmi/DC-PROV-Mapping/wiki/Complex-Mappings-S1 section "List of dc terms excluded from the mapping",
I suggest to organize by descriptive vs. provenance metadata. That way I can review your categorization more easily, AND focus on only the provenance metadata (which is the point of the mapping).

10)

In https://github.com/dcmi/DC-PROV-Mapping/wiki/Mapping-Primer

No bibliography for (DCMI Usage Board, 2010b) or (DCMI Usage Board, 2010a)

You don't reference the URL http://dublincore.org/documents/dcmi-terms/ ?

11) 

It seems like you could include the content of https://github.com/dcmi/DC-PROV-Mapping/wiki/Direct-Mappings and https://github.com/dcmi/DC-PROV-Mapping/wiki/Prov-Specializations directly in the "primer" - the redundancy is dissonant.

Why three complex mappings in the primer? Why now fewer?

The organization across 4 pages makes it difficult to determine "what is where". I think the content as it is could stand on its own as one document.

12)

Where is stage 2 of the complex mappings?


13) Are there implementations of your complex mapping?



14)

https://github.com/dcmi/DC-PROV-Mapping/wiki/Prov-Specializations

The following order makes more sense to me 

 dcprov:PublicationActivity      rdfs:subClassOf     prov:Activity .
 dcprov:ContributionActivity     rdfs:subClassOf     prov:Activity .
 dcprov:CreationActivity         rdfs:subClassOf     prov:Activity, dcprov:ContributionActivity .
 dcprov:ContributorRole          rdfs:subClassOf     prov:Role .
 dcprov:PublisherRole            rdfs:subClassOf     prov:Role .
 dcprov:CreatorRole              rdfs:subClassOf     prov:Role, dcprov:ContributorRole .



15)

https://github.com/dcmi/DC-PROV-Mapping/wiki/Prov-Specializations

Are the following used in the complex rules? It would be very nice to show which rules each specialization is used in. Similarly, it would be nice to group rules by their use of PROV terms, and by "in the where" versus "in the construct". A navigation like this would really bring the material together nicely.

 dcprov:PublicationActivity      rdfs:subClassOf     prov:Activity .
 dcprov:ContributionActivity     rdfs:subClassOf     prov:Activity .
 dcprov:CreationActivity         rdfs:subClassOf     prov:Activity, dcprov:ContributionActivity .
 dcprov:ContributorRole          rdfs:subClassOf     prov:Role .
 dcprov:PublisherRole            rdfs:subClassOf     prov:Role .
 dcprov:CreatorRole              rdfs:subClassOf     prov:Role, dcprov:ContributorRole .


16)

Is the following a copy paste error (publisher is never mentioned):

https://github.com/dcmi/DC-PROV-Mapping/wiki/Complex-Mappings-S1

Section: dct:publisher

 CONSTRUCT {
    ?doc a prov:Entity .
       prov:wasAttributedTo ?ag .
    _:out a prov:Entity .
       prov:specializationOf ?doc .
    ?ag a prov:Agent .
    _:act a prov:Activity, dcprov:PublicationActivity ;
       prov:wasAssociatedWith ?ag ;
       prov:qualifiedAssociation _:assoc .
    _:assoc a prov:Association ;
       prov:agent ?ag ;
       prov:hadRole dcprov:PublisherRole .
    _:out prov:wasGeneratedBy _:act ;
       prov:wasAttributedTo ?ag .
 } WHERE {
    ?doc dct:creator ?ag .
 }



17)

https://github.com/dcmi/DC-PROV-Mapping/wiki/Complex-Mappings-S1

spacing is off in:


 dct:rightsHolder
 
 The rightsHolder is different, here we propose to omit the activity and just add the rights holder to the entity by means of 
  prov:wasAttributedTo. This mapping could actually be omitted as the statements can be inferred from the direct mapping.

 CONSTRUCT {
  ?doc     a                         prov:Entity .
  ?ag       a                         prov:Agent .
  ?doc     prov:wasAttributedTo      ?ag .
 } WHERE { 
  ?doc dct:rightsHolder?ag .
 }


18)

https://github.com/dcmi/DC-PROV-Mapping/wiki/Complex-Mappings-S1

Recommend expanding variable names to be more readable (e.g., ?ag to ?agent)

19)

https://github.com/dcmi/DC-PROV-Mapping/wiki/Complex-Mappings-S1

Is there a reason why you use "_:iss_entity" instead of just the "[]" syntax? smearing a node across the CONSTRUCT makes it more difficult to read. You used the "[]" in :


dct:modified

 [ a prov:Generation ;
                                                 prov:atTime ?date  ;
                                                 prov:activity _:act . ] .






On May 30, 2012, at 1:57 PM, Kai Eckert wrote:

> Hello everyone,
> 
> in the Dublin Core Metada Provenance Task Group (with the help of Simon Miles), we have released an initial DC to PROV mapping draft.
> 
> The work has been divided in several documents to improve readability:
> 
> - The mapping primer [1] explains the process followed to do the mapping, the main rationale of our decisions and our next steps.
> 
> - The Direct Mappings document [2] shows the direct mappings found between DC and PROV (e.g., subPropertyOf relations).
> 
> - The PROV Specializations document [3] extends PROV-O with some basic roles and properties to be able to perform the complex mappings.
> 
> - Finally, the Complex-Mappings document [4] infers PROV statements from DC statements that are not covered by the direct mappings.
> 
> Please give us your feedback on our approach and the documents within one week (until Tuesday, June 5th).
> 
> We sent this mail both to the relevant DCMI mailinglists and the PROV mailinglist in order to reach consensus.
> 
> We are on a quite strict timetable now and aim at finishing the mapping (Stage 2, and the mapping back from PROV to DC) until end of June to reach the state of a public draft.
> 
> Daniel will briefly present the current state in the PROV call tomorrow. If you have any questions or comments, please don't hesitate to contact us.
> 
> Thanks,
> Kai, Daniel, Michael and Simon.
> 
> [1] https://github.com/dcmi/DC-PROV-Mapping/wiki/Mapping-primer
> [2] https://github.com/dcmi/DC-PROV-Mapping/wiki/Direct-Mappings
> [3] https://github.com/dcmi/DC-PROV-Mapping/wiki/Prov-Specializations
> [4] https://github.com/dcmi/DC-PROV-Mapping/wiki/Complex-Mappings-S1
> 
> -- 
> Kai Eckert
> Universitätsbibliothek Mannheim
> Stellv. Leiter Abteilung Digitale Bibliotheksdienste
> Schloss Schneckenhof West / 68131 Mannheim
> Tel. 0621/181-2946 Fax 0621/181-2918
> 
>
Received on Sunday, 3 June 2012 16:46:43 UTC