- From: Provenance Working Group Issue Tracker <sysbot+tracker@w3.org>
- Date: Tue, 16 Apr 2013 20:38:04 +0000
- To: public-prov-wg@w3.org
PROV-ISSUE-662 (stian-FB-DC): Stian's Feedback on PROV-DC [Mapping PROV-O to Dublin Core] http://www.w3.org/2011/prov/track/issues/662 Raised by: Daniel Garijo On product: Mapping PROV-O to Dublin Core Below is my review of https://dvcs.w3.org/hg/prov/raw-file/c6a741a9cdd8/dc-note/dc-note.html (last edited 2013-04-09) - however I have not checked properly if your latest changes have fixed some of these issues; as I started the review around 2013-04-01. Apologies for the delay in returning this review. This was due to other, previously unknown, deadlines knocking on the door. :). I hope it is not too late to include some of the revisions here until we vote on the document next week according to the plan. My comments are mainly editorial. Blocking issues: 21) dct:isVersionOf is NOT an equivalent property with prov:wasRevisionOf. 23) dct:references should be subproperty of prov:wasInfluencedBy 1) Outdated citations: > [DCTERMS] Dublin Core Terms Vocabulary. 8 December 2010. URL: http://dublincore.org/documents/dcmi-terms/ Should be: > Dublin Core Terms Vocabulary. 14 June 2012. URL: http://dublincore.org/documents/2012/06/14/dcmi-terms/ > [OWL2-OVERVIEW] W3C OWL Working Group. OWL 2 Web Ontology Language: Overview. 27 October 2009. W3C Recommendation. URL: http://www.w3.org/TR/2009/REC-owl2-overview-20091027/ should be: > [OWL2-OVERVIEW] W3C OWL Working Group. OWL 2 Web Ontology Language: Overview. 11 December 2012. W3C Recommendation. URL: http://www.w3.org/TR/2012/REC-owl2-overview-20121211/ 2) Links to mappings > The mapping is expressed partly by direct RDFS/OWL mappings between properties and classes, which can be found _here_. > Therefore, refinements of classes defined in PROV are needed to represent specific Dublin Core activities and roles. This set of PROV refinements can be accessed _here_. The use of "here" hyperlinks is not good practice because it does not mean anything, specially not when scanning the page for links. Try: > The mapping is expressed partly by _direct RDFS/OWL mappings (Turtle format)_ between properties and classes. > Therefore, _refinements of classes defined in PROV (Turtle format)_ are needed to represent specific Dublin Core activities and roles. 3) > The use of DC terms is preferred and the DC elements have been depecreated. --> deprecated 4) Table 1 is meant to categorize into What/Who/when/how - but for "Descriptive metadata" the sub-category is "-" instead of "What". 5) > but as ownership is considered the important provenance information for many resources "the" -> "to be" 6) > This leaves one very special term: provenance.(..) This term can be considered a link between the resource and any provenance statement about the resource, so it cannot be included in any of the aforementioned categories. Why is not "provenance" a "what"? How is it any different from say "abstract" or "tableOfContents" ? I suggest just changing "cannot be" to "is not" - and we can get away with it. 7) > Example 1: a simple metadata record: Add "in Turtle format [Turtle]". 8) > ex:doc1 dct:title "A mapping from Dublin Core..." ; > dct:creator ex:kai, ex:daniel, ex:simon, ex:michael ; > dct:created "2012-02-28" ; > (..) Could some indentation be used in the example for the continuation lines? ie: > ex:doc1 dct:title "A mapping from Dublin Core..." ; > dct:creator ex:kai, ex:daniel, ex:simon, ex:michael ; > dct:created "2012-02-28" ; > (..) (check your tabs -> spaces) 9) > are descriptions of the resource ex:doc1 italics on "descriptions" 10) > As a <code>dc</code> metadata dc -> "DC" and no <code> 11) > a different prov:specialization of the document --> prov:specializationOf 12) > which is a prov:sprecializationOf the resource --> prov:specializationOf 13) > Since we cannot ensure that the published resource has not suffered any further modifications, :_resultingEntity is also a prov:specializationOf the resource ex:doc1. I don't get this reasoning. I agree it is a specialization, as it is the ex:doc1, but only in the published state - but I don't understand the "cannot ensure" bit - it would be a specialization if there were modifications or not. Perhaps the idea being that there could be two publications that both led to ex:doc1 at different points in time? Change to: " :_resultingEntity is also a prov:specializationOf the resource ex:doc1, as it describes the document after a particular publication" 14) (not important) Figure 1 and following are blurry when zooming in or printing out. Is it possible to include the image in a higher resolution or as SVG (but scale it down with CSS)? For example, see Figure 1 in http://www.w3.org/TR/prov-o/#starting-points-figure 15) Figure 1 and following use a notation like: prov:Entity ex:doc1 it is not clear - beyond the capital letter - what is the identifier and what is the class. Could styling be used, such as italics on the classname? (UML uses «guillemets» - but perhaps italics would work better) 16) Figure use style _:user_entity but the text uses _:usedEntity. Suggestion is to unify them as _:usedEntity to match camelCase of prov-o terms 17) prov:Entities must exist before being used <code> style here is misleading -> "PROV entities" without <code> 18) > The mapping is divided in several subsections: > (..) > Section 3.4 : Strategies for cleaning up some of the blank nodes produced by the approach presented in Section 3.3. " :" ->":" 19) Table 3 includes dct:Agent and dct:ProvenanceStatement - but none of the DCT classes were introduced in Table 1. Many of the other DCT classes (BibliographicResource, LicenseDocument, PhysicalResource, etc) are generally mappable as subclasses of prov:Entity. We should either provide those or say why we have not provided them (for instance a particular license document becomes also a prov:Entity as soon as you talk about its provenance with say prov:wasAttributedTo). dct:Location should be equivalentClass to prov:Location prov:Collection subclassOf dcmitype:Collection (note: dcmitype:Software is NOT a subclassOf prov:SoftwareAgent - as a script file, C source code etc. are (generally) different from the active agent of their execution) 20) I kind of doubt that dct:rightsHolder is about provenance (although rights could have interesting provenance!), as you could easily be a rights holder without having any part of creating the resource. For instance Michael Jackson at some point bought the rights or Beatles songs, but he later sold those to Sony in 1995 [1]. So does that mean that a Beatles song from 1967 is attributed to Sony in 1995, because they are the rights holder? Which activity did Sony participate in? (Buying the rights?). This is difficult with DCTerms because the entities are fully mutable. If this was expanded in section 3.3.1 (prov:RightsAssignment ?) it could be OK. [1] http://www.snopes.com/music/artists/jackson.asp 21) dct:isVersionOf is NOT an equivalent property with prov:wasRevisionOf. BLOCKING. dct:isVersionOf is NOT an equivalent property with prov:wasRevisionOf. In DC Terms, isVersionOf is a hierarchical attribute, more on the lines of prov:specializationOf, and does not mandate any time directionality (thus is not a subproperty of prov:wasDerivedFrom). Example of hierarchical use: https://metacpan.org/source/ASCOPE/Net-Flickr-API-1.7/Changes <http://aaronland.info/perl/net/delicious/Net-Flickr-API-1.7.tar.gz> dcterms:isVersionOf <http://aaronland.info/perl/flickr/api/> ; dcterms:replaces <http://aaronland.info/perl/net/flickr/api/Net-Flickr-API-1.69.tar.gz>; <http://aaronland.info/perl/net/delicious/Net-Flickr-API-1.69.tar.gz> dcterms:isVersionOf <http://aaronland.info/perl/flickr/api/> ; dcterms:replaces <http://aaronland.info/perl/net/flickr/api/Net-Flickr-API-1.68.tar.gz>; And example of its "inverse" dct:hasVersion in use can be found in DCT itself: >From http://dublincore.org/2012/06/14/dcterms.ttl dcterms:hasPart dcterms:hasVersion <http://dublincore.org/usage/terms/history/#hasPart-003> ; dcterms:issued "2000-07-11"^^<http://www.w3.org/2001/XMLSchema#date> ; dcterms:modified "2008-01-14"^^<http://www.w3.org/2001/XMLSchema#date> ; a rdf:Property ; And in http://dublincore.org/usage/terms/history/#hasPart-003 it says (in HTML): that <http://dublincore.org/usage/terms/history/#hasPart-003> dcterms:replaces <http://dublincore.org/usage/terms/history/#hasPart-002> . So here dcterms:hasPart hasVersion both #hasPart-003 and #hasPart-002 - but #hasPart-003 replaces #hasPart-002. This is the same as our example of specializationOf in the primer - http://www.w3.org/TR/prov-primer/#alternate-entities-and-specialization. It would be strange to enforce prov:wasDerivedFrom for such hierarchical relationships, the BBC frontpage is not (necessarily) derived from the BBC frontpage today. On http://dublincore.org/documents/usageguide/qualifiers.shtml we find: > isVersionOf > > Label: Is Version Of > > Term description: The described resource is a version, edition, or adaptation of the referenced resource. Changes in version imply substantive changes in content rather than differences in format. > > Guidelines for creation of content: > > Use only in cases where the relationship expressed is at the content level. Relationships need not be close for the relationship to be relevant. "West Side Story" is a version of "Romeo and Juliet" and that may be important enough in the context of the resource description to be expressed using isVersionOf. The Broadway Show and the movie of "West Side Story" also relate at a similar level, but the video and DVD of the movie are more usefully expressed at the level of format, the content being essentially the same. > > See also isFormatOf. However not all dcterms:hasVersion / dcterms:isVersionOf relationships express hierarchical specialization, and so I don't recommend using prov:specializationOf as superproperty of prov:isFormatOf. More current usage and guideline for isVersionOf is provenance-related: http://wiki.dublincore.org/index.php/User_Guide/Creating_Metadata#IsVersionOf > This property describes the relationship between the described resource and another resource, that is a former version, edition or adaptation of the described resource (e.g. the described resource is the revision of a book, or another recording of a song, etc.). Another version implies changes in the content of a resource. For resources with different formats use isFormatOf. For the reciprocal statement use hasVersion. As a compromise I therefore suggest instead to say that: prov:wasRevisionOf rdfs:subPropertyOf dct:isVersionOf And equivalent for Table 5: prov:hadRevision rdfs:subPropertyOf dct:hasVersion 22) dct:hasFormat is also subproperty of prov:wasDerivedFrom dct:hasFormat is defined as: > A related resource that is substantially the same as the pre-existing described resource, but in another format. So the subject is pre-existing. http://wiki.dublincore.org/index.php/User_Guide/Creating_Metadata#IsFormatOf has more: > This property describes the relationship between the described resource and another resource, that is a former version of the described resource with the same intellectual content but presented in another format (e.g. the described resource is the microfilm version of a printed book, or the pdf version of a doc document). For intellectual changes between resources use isVersonOf. For the reciprocal statement use hasFormat. So this is implying that the object has somewhat been formed from the subject. Therefore dcterms:isFormatOf should be a subproperty of prov:wasDerivedFrom - in addition to being a subproperty of prov:alternateOf. Equivalent for Table 5: dcterms:hasFormat rdfs:subPropertyOf prov:hadDerivation 23) dct:references should be subproperty of prov:wasInfluencedBy dct:references is made a subproperty of prov:wasDerivedFrom, which sounds very strong to me. I would use prov:wasInfluencedBy. > Influence ◊ is the capacity of an entity, activity, or agent to have an effect on the character, development, or behavior of another by means of usage, start, end, generation, invalidation, communication, derivation, attribution, association, or delegation. (We don't know the details of how the reference was used). Equivalent for Table 5: dct:isReferencedBy rdfs:subPropertyOf prov:influenced 24) justification for dct:source > dct:source rdfs:subPropertyOf prov:wasDerivedFrom dct:source is defined as a "related resource from which the described resource is derived", which matches the notion of derivation in PROV-DM ("a transformation of an entity in another"). You need to justify why this is NOT an equivalent property. In SKOS-terms I would call them a skos:closeMatch rather than a skos:broadMatch; but in OWL/RDFS we don't have that luxury. I do agree on the mapping you suggest - to make it consistent with the other mappings. (with equivalent dct:isFormatOf would effectively become a subproperty of dct:source, which might be odd in DCT). So the justification should be something like: > However, prov:wasDerivedFrom also covers broader derivations such as "an update of an entity resulting in a new one" which is not covered by dct:source. 25) PROV refinements does not include mapping for dct:rightsHolder See #? above if this should be in or not. 26) > Additional refinements of the PROV properties have been ommitted, since the direct mappings presented in Section 3.1 already define the relationship between both vocabularies. What does this mean? Rephrase. 27) > The mapping corresponds to the graph in Figure 1 (with small changes for creator and rightsHolder). I don't understand this. Neither the mapping below nor Figure 1 describes rightsHolder. Figure 1 shows dct:publisher. Rephrase. 28) > A creator is the agent in charge of the "Create" activity that generated a specialization of the entity ?document. The agent is assigned the role "creator". Some use of <code> here would improve readability. Note: I have not checked the syntax of the SPARQL CONSTRUCTs beyond reading them. 29) > In case of publication, a second specialization representing the entity before the publication is necessary: Why is this necessary? If I write a blog post using Wordpress.com, and I immediately click "Publish", then there is no "unpublished" entity. Your argument would otherwise also potentially apply for contribution - if I contributed to the entity, it must have been created before! In both cases we would make unfounded assumptions about the contribution and publication activities. Remove the need for _:used_entity - you might instead leave a note that "If it is known that the ?document existed before publication, for instance as a draft, you may also add: _:used_entity a prov:Entity; prov:specializationOf ?document. _:activity prov:used _:used_entity . _:resulting_entity prov:wasDerivedFrom _:used_entity . This also applies to dct:issued. 30) dct:dateCopyrighted should NOT have a used_entity Copyright is usually something you have immediately, or are you arguing there is always an uncopyrightable used-entity first? (Say an empty document)? (Note that I'm fine with the used-entity for the remaining cases) 31) dct:isReplacedBy/dct:replaces should be subproperty of prov:alternateOf (and listed in Tables earlier) 32) > However, the derivation relationship cannot always be applied between the original entities, because they could have existed before the replacement took place (for example, if a book replaces another in a catalog we cannot say that it was derived from it). I agree - but then why does the query include: _:new_entity prov:wasDerivedFrom _:old_entity . 33) reosource -> resource > Property used to describe that the current resource is required for supporting the function of another resource. This is not related the provenance of the reosource 34) dct:date I think this could be given a complex mapping. DCT says: > A point or period of time associated with an event in the lifecycle of the resource. So perhaps just saying there was an event: CONSTRUCT{ _:event a prov:InstantaneousEvent ; prov:atTime ?date . } WHERE { ?document dct:date ?date. } However, as we don't know the nature of the association between the ?document and the ?date, this is a bit useless, and so if you think we include this, it should have a note: Note that the above inference would not generally be considered useful due to the ambiguity of dct:date (we don't know how the entity is related to the event), however the above rule is included here for completeness
Received on Tuesday, 16 April 2013 20:38:07 UTC