Re: Review of prov-dc

Thanks Stian for your review.
I have addressed most of the changes and answered your comments here:
http://www.w3.org/2011/prov/wiki/Stian_Soiland-Reyes
The document has now changes a lot since the last WD, so it would be great
if you could have a quick look before the next telecon.
Link:
https://dvcs.w3.org/hg/prov/raw-file/tip/dc-note/releases/NOTE-prov-dc-20130430/Overview.html#dct-rightsHolder
Best,
Daniel


2013/4/11 Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>

> Below is my review of
> https://dvcs.w3.org/hg/prov/raw-file/c6a741a9cdd8/dc-note/dc-note.html
> (last edited 2013-04-09) - however I have not checked properly if your
> latest changes have fixed some of these issues; as I started the
> review around 2013-04-01.
>
>
> Apologies for the delay in returning this review. This was due to
> other, previously unknown, deadlines knocking on the door. :). I hope
> it is not too late to include some of the revisions here until we vote
> on the document next week according to the plan.
>
>
>
> My comments are mainly editorial.
>
> Blocking issues:
>
> 21) dct:isVersionOf is NOT an equivalent property with prov:wasRevisionOf.
>
> 23) dct:references should be subproperty of prov:wasInfluencedBy
>
>
>
>
> 1) Outdated citations:
> > [DCTERMS] Dublin Core Terms Vocabulary. 8 December 2010. URL:
> http://dublincore.org/documents/dcmi-terms/
>
> Should be:
>
> > Dublin Core Terms Vocabulary. 14 June 2012. URL:
> http://dublincore.org/documents/2012/06/14/dcmi-terms/
>
>
>
> > [OWL2-OVERVIEW] W3C OWL Working Group. OWL 2 Web Ontology Language:
> Overview. 27 October 2009. W3C Recommendation. URL:
> http://www.w3.org/TR/2009/REC-owl2-overview-20091027/
>
> should be:
>
> > [OWL2-OVERVIEW] W3C OWL Working Group. OWL 2 Web Ontology Language:
> Overview. 11 December 2012. W3C Recommendation. URL:
> http://www.w3.org/TR/2012/REC-owl2-overview-20121211/
>
>
> 2) Links to mappings
> > The mapping is expressed partly by direct RDFS/OWL mappings between
> properties and classes, which can be found _here_.
> > Therefore, refinements of classes defined in PROV are needed to
> represent specific Dublin Core activities and roles. This set of PROV
> refinements can be accessed _here_.
>
> The use of "here" hyperlinks is not good practice because it does not
> mean anything, specially not when scanning the page for links.
>
> Try:
>
> > The mapping is expressed partly by _direct RDFS/OWL mappings (Turtle
> format)_ between properties and classes.
>
> > Therefore, _refinements of classes defined in PROV (Turtle format)_ are
> needed to represent specific Dublin Core activities and roles.
>
>
> 3)
> > The use of DC terms is preferred and the DC elements have been
> depecreated.
>
> --> deprecated
>
> 4)
> Table 1 is meant to categorize into What/Who/when/how - but for
> "Descriptive metadata" the sub-category is "-" instead of "What".
>
>
> 5)
> >  but as ownership is considered the important provenance information for
> many resources
> "the" -> "to be"
>
> 6)
>
> > This leaves one very special term: provenance.(..) This term can be
> considered a link between the resource and any provenance statement about
> the resource, so it cannot be included in any of the aforementioned
> categories.
>
> Why is not "provenance" a "what"? How is it any different from say
> "abstract" or "tableOfContents" ?
>
> I suggest just changing "cannot be" to "is not" - and we can get away with
> it.
>
>
> 7)
> > Example 1: a simple metadata record:
>
> Add "in Turtle format [Turtle]".
>
>
> 8)
>
> > ex:doc1 dct:title "A mapping from Dublin Core..." ;
> > dct:creator ex:kai, ex:daniel, ex:simon, ex:michael ;
> > dct:created "2012-02-28" ;
> > (..)
>
> Could some indentation be used in the example for the continuation lines?
> ie:
>
> > ex:doc1 dct:title "A mapping from Dublin Core..." ;
> >     dct:creator ex:kai, ex:daniel, ex:simon, ex:michael ;
> >     dct:created "2012-02-28" ;
> > (..)
>
> (check your tabs -> spaces)
>
>
> 9)
> > are descriptions of the resource ex:doc1
>
> italics on "descriptions"
>
>
> 10)
>
> > As a <code>dc</code> metadata
>
> dc -> "DC" and no <code>
>
>
> 11)
> > a different prov:specialization of the document
> --> prov:specializationOf
>
> 12)
> > which is a prov:sprecializationOf the resource
> --> prov:specializationOf
>
>
> 13)
>
> > Since we cannot ensure that the published resource has not suffered any
> further modifications, :_resultingEntity is also a prov:specializationOf
> the resource ex:doc1.
>
> I don't get this reasoning. I agree it is a specialization, as it is
> the ex:doc1, but only in the published state - but I don't understand
> the "cannot ensure" bit - it would be a specialization if there were
> modifications or not. Perhaps the idea being that there could be two
> publications that both led to ex:doc1 at different points in time?
>
>  Change to:
>
> " :_resultingEntity is also a prov:specializationOf the resource
> ex:doc1, as it describes the document after a particular publication"
>
>
> 14) (not important)
> Figure 1 and following are blurry when zooming in or printing out. Is
> it possible to include the image in a higher resolution or as SVG (but
> scale it down with CSS)? For example, see Figure 1 in
> http://www.w3.org/TR/prov-o/#starting-points-figure
>
>
> 15)
> Figure 1 and following use a notation like:
>
> prov:Entity
> ex:doc1
>
> it is not clear - beyond the capital letter - what is the identifier
> and what is the class. Could styling be used, such as italics on the
> classname? (UML uses «guillemets» - but perhaps italics would work
> better)
>
>
> 16)
>
> Figure use style _:user_entity but the text uses _:usedEntity.
> Suggestion is to unify them as _:usedEntity to match camelCase of
> prov-o terms
>
>
> 17) prov:Entities must exist before being used
>
> <code> style here is misleading -> "PROV entities" without <code>
>
>
> 18)
>
> > The mapping is divided in several subsections:
> > (..)
> > Section 3.4 : Strategies for cleaning up some of the blank nodes
> produced by the approach presented in Section 3.3.
>
> " :" ->":"
>
>
> 19)
> Table 3 includes dct:Agent and dct:ProvenanceStatement - but none of
> the DCT classes were introduced in Table 1.
>
> Many of the other DCT classes (BibliographicResource,
> LicenseDocument, PhysicalResource, etc) are generally mappable as
> subclasses of prov:Entity. We should either provide those or say why
> we have not provided them (for instance a particular license document
> becomes also a prov:Entity as soon as you talk about its provenance
> with say prov:wasAttributedTo).
>
> dct:Location should be equivalentClass to prov:Location
> prov:Collection subclassOf dcmitype:Collection
> (note: dcmitype:Software is NOT a subclassOf prov:SoftwareAgent - as a
> script file, C source code etc. are (generally) different from the
> active agent of their execution)
>
>
>
> 20)
> I kind of doubt that dct:rightsHolder is about provenance (although
> rights could have interesting provenance!), as you could easily be a
> rights holder without having any part of creating the resource. For
> instance Michael Jackson at some point bought the rights or Beatles
> songs, but he later sold those to Sony in 1995 [1]. So does that mean
> that a Beatles song from 1967 is attributed to Sony in 1995, because
> they are the rights holder? Which activity did Sony participate in?
> (Buying the rights?). This is difficult with DCTerms because the
> entities are fully mutable.
>
> If this was expanded in section 3.3.1 (prov:RightsAssignment ?) it could
> be OK.
>
> [1] http://www.snopes.com/music/artists/jackson.asp
>
>
> 21) dct:isVersionOf is NOT an equivalent property with prov:wasRevisionOf.
>
> BLOCKING.
>
> dct:isVersionOf is NOT an equivalent property with prov:wasRevisionOf.
>
> In DC Terms, isVersionOf is a hierarchical attribute, more on the
> lines of prov:specializationOf, and does not mandate any time
> directionality (thus is not a subproperty of prov:wasDerivedFrom).
>
> Example of hierarchical use:
>
> https://metacpan.org/source/ASCOPE/Net-Flickr-API-1.7/Changes
>
> <http://aaronland.info/perl/net/delicious/Net-Flickr-API-1.7.tar.gz>
>         dcterms:isVersionOf <http://aaronland.info/perl/flickr/api/> ;
>         dcterms:replaces
> <http://aaronland.info/perl/net/flickr/api/Net-Flickr-API-1.69.tar.gz>;
>
> <http://aaronland.info/perl/net/delicious/Net-Flickr-API-1.69.tar.gz>
>         dcterms:isVersionOf <http://aaronland.info/perl/flickr/api/> ;
>         dcterms:replaces
> <http://aaronland.info/perl/net/flickr/api/Net-Flickr-API-1.68.tar.gz>;
>
>
> And example of its "inverse" dct:hasVersion in use can be found in DCT
> itself:
>
> >From http://dublincore.org/2012/06/14/dcterms.ttl
>
>
> dcterms:hasPart
>     dcterms:hasVersion
> <http://dublincore.org/usage/terms/history/#hasPart-003> ;
>     dcterms:issued "2000-07-11"^^<http://www.w3.org/2001/XMLSchema#date> ;
>     dcterms:modified "2008-01-14"^^<http://www.w3.org/2001/XMLSchema#date>
> ;
>     a rdf:Property ;
>
> And in http://dublincore.org/usage/terms/history/#hasPart-003 it says
> (in HTML): that
>
>     <http://dublincore.org/usage/terms/history/#hasPart-003>
> dcterms:replaces
> <http://dublincore.org/usage/terms/history/#hasPart-002> .
>
> So here dcterms:hasPart hasVersion both #hasPart-003 and #hasPart-002
> - but #hasPart-003 replaces #hasPart-002. This is the same as our
> example of specializationOf in the primer -
> http://www.w3.org/TR/prov-primer/#alternate-entities-and-specialization.
>
> It would be strange to enforce prov:wasDerivedFrom for such
> hierarchical relationships, the BBC frontpage is not (necessarily)
> derived from the BBC frontpage today.
>
>
>
> On http://dublincore.org/documents/usageguide/qualifiers.shtml we find:
>
> > isVersionOf
> >
> > Label: Is Version Of
> >
> > Term description: The described resource is a version, edition, or
> adaptation of the referenced resource. Changes in version imply substantive
> changes in content rather than differences in format.
> >
> > Guidelines for creation of content:
> >
> > Use only in cases where the relationship expressed is at the content
> level. Relationships need not be close for the relationship to be relevant.
> "West Side Story" is a version of "Romeo and Juliet" and that may be
> important enough in the context of the resource description to be expressed
> using isVersionOf. The Broadway Show and the movie of "West Side Story"
> also relate at a similar level, but the video and DVD of the movie are more
> usefully expressed at the level of format, the content being essentially
> the same.
> >
> > See also isFormatOf.
>
>
>
> However not all  dcterms:hasVersion / dcterms:isVersionOf
> relationships express hierarchical specialization, and so I don't
> recommend using prov:specializationOf as superproperty of
> prov:isFormatOf.
>
>
> More current usage and guideline for isVersionOf is provenance-related:
>
>
> http://wiki.dublincore.org/index.php/User_Guide/Creating_Metadata#IsVersionOf
>
> > This property describes the relationship between the described resource
> and another resource, that is a former version, edition or adaptation of
> the described resource (e.g. the described resource is the revision of a
> book, or another recording of a song, etc.). Another version implies
> changes in the content of a resource. For resources with different formats
> use isFormatOf. For the reciprocal statement use hasVersion.
>
> As a compromise I therefore suggest instead to say that:
>
>   prov:wasRevisionOf rdfs:subPropertyOf dct:isVersionOf
>
> And equivalent for Table 5:
>
>   prov:hadRevision rdfs:subPropertyOf dct:hasVersion
>
>
>
>
> 22) dct:hasFormat is also subproperty of prov:wasDerivedFrom
>
> dct:hasFormat is defined as:
> >  A related resource that is substantially the same as the pre-existing
> described resource, but in another format.
>
> So the subject is pre-existing.
>
>
> http://wiki.dublincore.org/index.php/User_Guide/Creating_Metadata#IsFormatOf
>  has more:
>
> > This property describes the relationship between the described resource
> and another resource, that is a former version of the described resource
> with the same intellectual content but presented in another format (e.g.
> the described resource is the microfilm version of a printed book, or the
> pdf version of a doc document). For intellectual changes between resources
> use isVersonOf. For the reciprocal statement use hasFormat.
>
> So this is implying that the object has somewhat been formed from the
> subject.
>
> Therefore dcterms:isFormatOf should be a subproperty of
> prov:wasDerivedFrom - in addition to being a subproperty of
> prov:alternateOf.
>
> Equivalent for Table 5:
>
>   dcterms:hasFormat rdfs:subPropertyOf prov:hadDerivation
>
>
>
>
>
> 23) dct:references should be subproperty of prov:wasInfluencedBy
>
> dct:references is made a subproperty of prov:wasDerivedFrom, which
> sounds very strong to me. I would use prov:wasInfluencedBy.
>
> > Influence ◊ is the capacity of an entity, activity, or agent to have an
> effect on the character, development, or behavior of another by means of
> usage, start, end, generation, invalidation, communication, derivation,
> attribution, association, or delegation.
>
> (We don't know the details of how the reference was used).
>
> Equivalent for Table 5:
>
> dct:isReferencedBy rdfs:subPropertyOf prov:influenced
>
>
>
> 24) justification for dct:source
>
> > dct:source    rdfs:subPropertyOf      prov:wasDerivedFrom     dct:source
> is defined as a "related resource from which the described resource is
> derived", which matches the notion of derivation in PROV-DM ("a
> transformation of an entity in another").
>
> You need to justify why this is NOT an equivalent property. In
> SKOS-terms I would call them a skos:closeMatch rather than a
> skos:broadMatch; but in OWL/RDFS we don't have that luxury. I do agree
> on the mapping you suggest - to make it consistent with the other
> mappings. (with equivalent dct:isFormatOf would effectively become a
> subproperty of dct:source, which might be odd in DCT). So the
> justification should be something like:
>
> > However, prov:wasDerivedFrom also covers broader derivations such as "an
> update of an entity resulting in a new one" which is not covered by
> dct:source.
>
>
>
> 25) PROV refinements does not include mapping for dct:rightsHolder
>
> See #? above if this should be in or not.
>
>
> 26)
>
> > Additional refinements of the PROV properties have been ommitted, since
> the direct mappings presented in Section 3.1 already define the
> relationship between both vocabularies.
>
> What does this mean? Rephrase.
>
>
> 27)
>
> > The mapping corresponds to the graph in Figure 1 (with small changes for
> creator and rightsHolder).
>
>
> I don't understand this. Neither the mapping below nor Figure 1
> describes rightsHolder. Figure 1 shows dct:publisher. Rephrase.
>
>
> 28)
>
> > A creator is the agent in charge of the "Create" activity that generated
> a specialization of the entity ?document. The agent is assigned the role
> "creator".
>
> Some use of <code> here would improve readability.
>
>
>
> Note: I have not checked the syntax of the SPARQL CONSTRUCTs beyond
> reading them.
>
>
> 29)
>
> > In case of publication, a second specialization representing the entity
> before the publication is necessary:
>
> Why is this necessary? If I write a blog post using Wordpress.com, and
> I immediately click "Publish", then there is no "unpublished" entity.
> Your argument would otherwise also potentially apply for contribution
> - if I contributed to the entity, it must have been created before! In
> both cases we would make unfounded assumptions about the contribution
> and publication activities.
>
> Remove the need for _:used_entity - you might instead leave a note
> that "If it is known that the ?document existed before publication,
> for instance as a draft, you may also add:
>
>         _:used_entity a prov:Entity;
>                         prov:specializationOf ?document.
>
>         _:activity      prov:used _:used_entity .
>
>         _:resulting_entity prov:wasDerivedFrom _:used_entity .
>
>
> This also applies to dct:issued.
>
>
> 30) dct:dateCopyrighted should NOT have a used_entity
>
> Copyright is usually something you have immediately, or are you
> arguing there is always an uncopyrightable used-entity first? (Say an
> empty document)?
>
>
>
> (Note that I'm fine with the used-entity for the remaining cases)
>
>
> 31) dct:isReplacedBy/dct:replaces should be subproperty of prov:alternateOf
>
> (and listed in Tables earlier)
>
>
> 32)
>
> > However, the derivation relationship cannot always be applied between
> the original entities, because they could have existed before the
> replacement took place (for example, if a book replaces another in a
> catalog we cannot say that it was derived from it).
>
> I agree - but then why does the query include:
>
>  _:new_entity prov:wasDerivedFrom _:old_entity .
>
>
> 33) reosource -> resource
>
> > Property used to describe that the current resource is required for
> supporting the function of another resource. This is not related the
> provenance of the reosource
>
>
> 34) dct:date
>
> I think this could be given a complex mapping.
>
> DCT says:
>
> > A point or period of time associated with an event in the lifecycle of
> the resource.
>
> So perhaps just saying there was an event:
>
> CONSTRUCT{
>          _:event a prov:InstantaneousEvent ;
>              prov:atTime ?date .
>  } WHERE {
>   ?document dct:date ?date.
>  }
>
> However, as we don't know the nature of the association between the
> ?document and the ?date, this is a bit useless, and so if you think we
> include this, it should have a note:
>
> Note that the above inference would not generally be considered useful
> due to the ambiguity of dct:date (we don't know how the entity is
> related to the event), however the above rule is included here for
> completeness.
>
>
>
>
>
> --
> Stian Soiland-Reyes, myGrid team
> School of Computer Science
> The University of Manchester
>
>

Received on Thursday, 18 April 2013 01:28:38 UTC