- From: Timothy Lebo <lebot@rpi.edu>
- Date: Sun, 3 Jun 2012 12:46:02 -0400
- To: Kai Eckert <kai@informatik.uni-mannheim.de>
- Cc: public-prov-wg@w3.org, DC-PROVENANCE@JISCMAIL.AC.UK, DCMI Architecture <dc-architecture@JISCMAIL.AC.UK>
Kai, Thanks for coordinating this mapping effort. It is clear that some thought has gone into both _how_ to map and _what_ the mappings should be. I'm sure others will be following in these similar steps as PROV matures. We'll all certainly benefit from this work. I'm including a smattering of comments that I jotted down as I worked through your materials. It's "from the hip", so take it or leave any that do not concern you. Some are editorial, others are about the content. Also, feel free to follow up on any that you would like more clarity or discussion on. Regards, Tim Lebo Regarding https://github.com/dcmi/DC-PROV-Mapping/wiki/Mapping-Primer#wiki-References 1) "To be more precise, we define provenance metadata as metadata providing provenance information according to the definition of the W3C Provenance Incubator Group" Why are you still using the XG's definition? Does PROV-WG still not provide one that you like? Should PROV-WG be explicit about their definition of provenance (since its materials will become Recommendation and XG's will not)? 2) "For the complex mappings, we take the following approach: " is confusing. Is one of the "three parts" enumerated above "complex". Ah, yes. The third. Suggest to draw that connection more clearly. 3) The points in the second half of the paragraph: ". A rationale for these two steps is that the mappings in stage 1 are context free and do not depend on the existence of any other statements. On the other hand, by employing the patterns developed for stage 2, any kind of generated PROV data could be cleaned up at a later point, for instance after the integration with provenance information from a different source, which could be advantageous. " really should be promoted to the first half of the paragraph. It takes too long to determine what the distinction is between the two phases. 4) The use of blank nodes is disturbing (http://linkeddatabook.com/editions/1.0/#htoc16). Please make it clear that the bnodes only exist during the processing that you suggest, and that bnodes are not produced in resulting PROV or DC records. 5) Direct mappings: -1 dct:references rdfs:subPropertyOf prov:wasDerivedFrom . +1 dct:creator rdfs:subPropertyOf prov:wasAttributedTo . +1 dct:rightsHolder rdfs:subPropertyOf prov:wasAttributedTo . -1 (casting a broad to a specific) dct:date rdfs:subPropertyOf prov:generatedAtTime . +1 dct:Agent owl:equivalentClass prov:Agent . -1 (reverse these) prov:hadOriginalSource rdfs:subPropertyOf dct:source . +1 prov:wasRevisionOf rdfs:subPropertyOf dct:isVersionOf . Voting for all of them (in https://github.com/dcmi/DC-PROV-Mapping/wiki/Direct-Mappings): +1 dct:Agent owl:equivalentClass prov:Agent. -1 dct:references rdfs:subPropertyOf prov:wasDerivedFrom . +1 dct:rightsHolder rdfs:subPropertyOf prov:wasAttributedTo . +1 dct:creator rdfs:subPropertyOf prov:wasAttributedTo . +1 dct:publisher rdfs:subPropertyOf prov:wasAttributedTo . +1 dct:contributor rdfs:subPropertyOf prov:wasAttributedTo . +1 dct:isVersionOf rdfs:subPropertyOf prov:wasDerivedFrom . +1 dct:isFormatOf rdfs:subPropertyOf prov:alternateOf . +1 dct:replaces rdfs:subPropertyOf prov:tracedTo . +1 dct:source rdfs:subPropertyOf prov:wasDerivedFrom . -1 dct:date rdfs:subPropertyOf prov:generatedAtTime . I would support reversing the above. As it is, you are casting a general "any date you wish" into a very specific meaning. At first glance, the following are concerning. If the same instance has all of these properties, then it was generated at many distinct times. Perhaps your complex mappings tease this out. -1 dct:issued rdfs:subPropertyOf prov:generatedAtTime . -1 dct:dateAccepted rdfs:subPropertyOf prov:generatedAtTime . -1 dct:dateCopyRighted rdfs:subPropertyOf prov:generatedAtTime . -1 dct:dateSubmitted rdfs:subPropertyOf prov:generatedAtTime . -1 dct:modified rdfs:subPropertyOf prov:generatedAtTime . The following casts a range into an instant of time. -1 dct:valid rdfs:subPropertyOf prov:generatedAtTime . -1 prov:hadOriginalSource rdfs:subPropertyOf dct:source . I would support reversing the above. PROV is pointing to a subset of the sources that dct:source intends to cite. dct:source is the union of hadOriginalSource and any of its derivations (and more, perhaps). +1 prov:wasRevisionOf rdfs:subPropertyOf dct:isVersionOf . 6) In https://github.com/dcmi/DC-PROV-Mapping/wiki/Mapping-Primer For readability, I'd reverse the order of these: dcprov:CreationActivity rdfs:subClassOf prov:Activity, dcprov:ContributionActivity . dcprov:ContributionActivity rdfs:subClassOf prov:Activity . 7) In https://github.com/dcmi/DC-PROV-Mapping/wiki/Mapping-Primer For readability, I'd reverse the order of these: dcprov:CreatorRole rdfs:subClassOf prov:Role, dcprov:ContributorRole . dcprov:ContributorRole rdfs:subClassOf prov:Role . 8) If we reapply the SPARQL queries from the complex mappings twice, do we get two un-identified blank nodes that should be identified? If so, this leads to proliferation of bnodes that should be avoided. If the queries are only to be informative, and those bnodes to be appropriately named to avoid duplication, then I suggest this be clearly stated. 9) In https://github.com/dcmi/DC-PROV-Mapping/wiki/Complex-Mappings-S1 section "List of dc terms excluded from the mapping", I suggest to organize by descriptive vs. provenance metadata. That way I can review your categorization more easily, AND focus on only the provenance metadata (which is the point of the mapping). 10) In https://github.com/dcmi/DC-PROV-Mapping/wiki/Mapping-Primer No bibliography for (DCMI Usage Board, 2010b) or (DCMI Usage Board, 2010a) You don't reference the URL http://dublincore.org/documents/dcmi-terms/ ? 11) It seems like you could include the content of https://github.com/dcmi/DC-PROV-Mapping/wiki/Direct-Mappings and https://github.com/dcmi/DC-PROV-Mapping/wiki/Prov-Specializations directly in the "primer" - the redundancy is dissonant. Why three complex mappings in the primer? Why now fewer? The organization across 4 pages makes it difficult to determine "what is where". I think the content as it is could stand on its own as one document. 12) Where is stage 2 of the complex mappings? 13) Are there implementations of your complex mapping? 14) https://github.com/dcmi/DC-PROV-Mapping/wiki/Prov-Specializations The following order makes more sense to me dcprov:PublicationActivity rdfs:subClassOf prov:Activity . dcprov:ContributionActivity rdfs:subClassOf prov:Activity . dcprov:CreationActivity rdfs:subClassOf prov:Activity, dcprov:ContributionActivity . dcprov:ContributorRole rdfs:subClassOf prov:Role . dcprov:PublisherRole rdfs:subClassOf prov:Role . dcprov:CreatorRole rdfs:subClassOf prov:Role, dcprov:ContributorRole . 15) https://github.com/dcmi/DC-PROV-Mapping/wiki/Prov-Specializations Are the following used in the complex rules? It would be very nice to show which rules each specialization is used in. Similarly, it would be nice to group rules by their use of PROV terms, and by "in the where" versus "in the construct". A navigation like this would really bring the material together nicely. dcprov:PublicationActivity rdfs:subClassOf prov:Activity . dcprov:ContributionActivity rdfs:subClassOf prov:Activity . dcprov:CreationActivity rdfs:subClassOf prov:Activity, dcprov:ContributionActivity . dcprov:ContributorRole rdfs:subClassOf prov:Role . dcprov:PublisherRole rdfs:subClassOf prov:Role . dcprov:CreatorRole rdfs:subClassOf prov:Role, dcprov:ContributorRole . 16) Is the following a copy paste error (publisher is never mentioned): https://github.com/dcmi/DC-PROV-Mapping/wiki/Complex-Mappings-S1 Section: dct:publisher CONSTRUCT { ?doc a prov:Entity . prov:wasAttributedTo ?ag . _:out a prov:Entity . prov:specializationOf ?doc . ?ag a prov:Agent . _:act a prov:Activity, dcprov:PublicationActivity ; prov:wasAssociatedWith ?ag ; prov:qualifiedAssociation _:assoc . _:assoc a prov:Association ; prov:agent ?ag ; prov:hadRole dcprov:PublisherRole . _:out prov:wasGeneratedBy _:act ; prov:wasAttributedTo ?ag . } WHERE { ?doc dct:creator ?ag . } 17) https://github.com/dcmi/DC-PROV-Mapping/wiki/Complex-Mappings-S1 spacing is off in: dct:rightsHolder The rightsHolder is different, here we propose to omit the activity and just add the rights holder to the entity by means of prov:wasAttributedTo. This mapping could actually be omitted as the statements can be inferred from the direct mapping. CONSTRUCT { ?doc a prov:Entity . ?ag a prov:Agent . ?doc prov:wasAttributedTo ?ag . } WHERE { ?doc dct:rightsHolder?ag . } 18) https://github.com/dcmi/DC-PROV-Mapping/wiki/Complex-Mappings-S1 Recommend expanding variable names to be more readable (e.g., ?ag to ?agent) 19) https://github.com/dcmi/DC-PROV-Mapping/wiki/Complex-Mappings-S1 Is there a reason why you use "_:iss_entity" instead of just the "[]" syntax? smearing a node across the CONSTRUCT makes it more difficult to read. You used the "[]" in : dct:modified [ a prov:Generation ; prov:atTime ?date ; prov:activity _:act . ] . On May 30, 2012, at 1:57 PM, Kai Eckert wrote: > Hello everyone, > > in the Dublin Core Metada Provenance Task Group (with the help of Simon Miles), we have released an initial DC to PROV mapping draft. > > The work has been divided in several documents to improve readability: > > - The mapping primer [1] explains the process followed to do the mapping, the main rationale of our decisions and our next steps. > > - The Direct Mappings document [2] shows the direct mappings found between DC and PROV (e.g., subPropertyOf relations). > > - The PROV Specializations document [3] extends PROV-O with some basic roles and properties to be able to perform the complex mappings. > > - Finally, the Complex-Mappings document [4] infers PROV statements from DC statements that are not covered by the direct mappings. > > Please give us your feedback on our approach and the documents within one week (until Tuesday, June 5th). > > We sent this mail both to the relevant DCMI mailinglists and the PROV mailinglist in order to reach consensus. > > We are on a quite strict timetable now and aim at finishing the mapping (Stage 2, and the mapping back from PROV to DC) until end of June to reach the state of a public draft. > > Daniel will briefly present the current state in the PROV call tomorrow. If you have any questions or comments, please don't hesitate to contact us. > > Thanks, > Kai, Daniel, Michael and Simon. > > [1] https://github.com/dcmi/DC-PROV-Mapping/wiki/Mapping-primer > [2] https://github.com/dcmi/DC-PROV-Mapping/wiki/Direct-Mappings > [3] https://github.com/dcmi/DC-PROV-Mapping/wiki/Prov-Specializations > [4] https://github.com/dcmi/DC-PROV-Mapping/wiki/Complex-Mappings-S1 > > -- > Kai Eckert > Universitätsbibliothek Mannheim > Stellv. Leiter Abteilung Digitale Bibliotheksdienste > Schloss Schneckenhof West / 68131 Mannheim > Tel. 0621/181-2946 Fax 0621/181-2918 > >
Received on Sunday, 3 June 2012 16:46:43 UTC