Review of prov-dc

Below is my review of
https://dvcs.w3.org/hg/prov/raw-file/c6a741a9cdd8/dc-note/dc-note.html
(last edited 2013-04-09) - however I have not checked properly if your
latest changes have fixed some of these issues; as I started the
review around 2013-04-01.


Apologies for the delay in returning this review. This was due to
other, previously unknown, deadlines knocking on the door. :). I hope
it is not too late to include some of the revisions here until we vote
on the document next week according to the plan.



My comments are mainly editorial.

Blocking issues:

21) dct:isVersionOf is NOT an equivalent property with prov:wasRevisionOf.

23) dct:references should be subproperty of prov:wasInfluencedBy




1) Outdated citations:
> [DCTERMS] Dublin Core Terms Vocabulary. 8 December 2010. URL: http://dublincore.org/documents/dcmi-terms/

Should be:

> Dublin Core Terms Vocabulary. 14 June 2012. URL: http://dublincore.org/documents/2012/06/14/dcmi-terms/



> [OWL2-OVERVIEW] W3C OWL Working Group. OWL 2 Web Ontology Language: Overview. 27 October 2009. W3C Recommendation. URL: http://www.w3.org/TR/2009/REC-owl2-overview-20091027/

should be:

> [OWL2-OVERVIEW] W3C OWL Working Group. OWL 2 Web Ontology Language: Overview. 11 December 2012. W3C Recommendation. URL: http://www.w3.org/TR/2012/REC-owl2-overview-20121211/


2) Links to mappings
> The mapping is expressed partly by direct RDFS/OWL mappings between properties and classes, which can be found _here_.
> Therefore, refinements of classes defined in PROV are needed to represent specific Dublin Core activities and roles. This set of PROV refinements can be accessed _here_.

The use of "here" hyperlinks is not good practice because it does not
mean anything, specially not when scanning the page for links.

Try:

> The mapping is expressed partly by _direct RDFS/OWL mappings (Turtle format)_ between properties and classes.

> Therefore, _refinements of classes defined in PROV (Turtle format)_ are needed to represent specific Dublin Core activities and roles.


3)
> The use of DC terms is preferred and the DC elements have been depecreated.

--> deprecated

4)
Table 1 is meant to categorize into What/Who/when/how - but for
"Descriptive metadata" the sub-category is "-" instead of "What".


5)
>  but as ownership is considered the important provenance information for many resources
"the" -> "to be"

6)

> This leaves one very special term: provenance.(..) This term can be considered a link between the resource and any provenance statement about the resource, so it cannot be included in any of the aforementioned categories.

Why is not "provenance" a "what"? How is it any different from say
"abstract" or "tableOfContents" ?

I suggest just changing "cannot be" to "is not" - and we can get away with it.


7)
> Example 1: a simple metadata record:

Add "in Turtle format [Turtle]".


8)

> ex:doc1 dct:title "A mapping from Dublin Core..." ;
> dct:creator ex:kai, ex:daniel, ex:simon, ex:michael ;
> dct:created "2012-02-28" ;
> (..)

Could some indentation be used in the example for the continuation lines? ie:

> ex:doc1 dct:title "A mapping from Dublin Core..." ;
>     dct:creator ex:kai, ex:daniel, ex:simon, ex:michael ;
>     dct:created "2012-02-28" ;
> (..)

(check your tabs -> spaces)


9)
> are descriptions of the resource ex:doc1

italics on "descriptions"


10)

> As a <code>dc</code> metadata

dc -> "DC" and no <code>


11)
> a different prov:specialization of the document
--> prov:specializationOf

12)
> which is a prov:sprecializationOf the resource
--> prov:specializationOf


13)

> Since we cannot ensure that the published resource has not suffered any further modifications, :_resultingEntity is also a prov:specializationOf the resource ex:doc1.

I don't get this reasoning. I agree it is a specialization, as it is
the ex:doc1, but only in the published state - but I don't understand
the "cannot ensure" bit - it would be a specialization if there were
modifications or not. Perhaps the idea being that there could be two
publications that both led to ex:doc1 at different points in time?

 Change to:

" :_resultingEntity is also a prov:specializationOf the resource
ex:doc1, as it describes the document after a particular publication"


14) (not important)
Figure 1 and following are blurry when zooming in or printing out. Is
it possible to include the image in a higher resolution or as SVG (but
scale it down with CSS)? For example, see Figure 1 in
http://www.w3.org/TR/prov-o/#starting-points-figure


15)
Figure 1 and following use a notation like:

prov:Entity
ex:doc1

it is not clear - beyond the capital letter - what is the identifier
and what is the class. Could styling be used, such as italics on the
classname? (UML uses «guillemets» - but perhaps italics would work
better)


16)

Figure use style _:user_entity but the text uses _:usedEntity.
Suggestion is to unify them as _:usedEntity to match camelCase of
prov-o terms


17) prov:Entities must exist before being used

<code> style here is misleading -> "PROV entities" without <code>


18)

> The mapping is divided in several subsections:
> (..)
> Section 3.4 : Strategies for cleaning up some of the blank nodes produced by the approach presented in Section 3.3.

" :" ->":"


19)
Table 3 includes dct:Agent and dct:ProvenanceStatement - but none of
the DCT classes were introduced in Table 1.

Many of the other DCT classes (BibliographicResource,
LicenseDocument, PhysicalResource, etc) are generally mappable as
subclasses of prov:Entity. We should either provide those or say why
we have not provided them (for instance a particular license document
becomes also a prov:Entity as soon as you talk about its provenance
with say prov:wasAttributedTo).

dct:Location should be equivalentClass to prov:Location
prov:Collection subclassOf dcmitype:Collection
(note: dcmitype:Software is NOT a subclassOf prov:SoftwareAgent - as a
script file, C source code etc. are (generally) different from the
active agent of their execution)



20)
I kind of doubt that dct:rightsHolder is about provenance (although
rights could have interesting provenance!), as you could easily be a
rights holder without having any part of creating the resource. For
instance Michael Jackson at some point bought the rights or Beatles
songs, but he later sold those to Sony in 1995 [1]. So does that mean
that a Beatles song from 1967 is attributed to Sony in 1995, because
they are the rights holder? Which activity did Sony participate in?
(Buying the rights?). This is difficult with DCTerms because the
entities are fully mutable.

If this was expanded in section 3.3.1 (prov:RightsAssignment ?) it could be OK.

[1] http://www.snopes.com/music/artists/jackson.asp


21) dct:isVersionOf is NOT an equivalent property with prov:wasRevisionOf.

BLOCKING.

dct:isVersionOf is NOT an equivalent property with prov:wasRevisionOf.

In DC Terms, isVersionOf is a hierarchical attribute, more on the
lines of prov:specializationOf, and does not mandate any time
directionality (thus is not a subproperty of prov:wasDerivedFrom).

Example of hierarchical use:

https://metacpan.org/source/ASCOPE/Net-Flickr-API-1.7/Changes

<http://aaronland.info/perl/net/delicious/Net-Flickr-API-1.7.tar.gz>
        dcterms:isVersionOf <http://aaronland.info/perl/flickr/api/> ;
        dcterms:replaces
<http://aaronland.info/perl/net/flickr/api/Net-Flickr-API-1.69.tar.gz>;

<http://aaronland.info/perl/net/delicious/Net-Flickr-API-1.69.tar.gz>
        dcterms:isVersionOf <http://aaronland.info/perl/flickr/api/> ;
        dcterms:replaces
<http://aaronland.info/perl/net/flickr/api/Net-Flickr-API-1.68.tar.gz>;


And example of its "inverse" dct:hasVersion in use can be found in DCT itself:

>From http://dublincore.org/2012/06/14/dcterms.ttl


dcterms:hasPart
    dcterms:hasVersion
<http://dublincore.org/usage/terms/history/#hasPart-003> ;
    dcterms:issued "2000-07-11"^^<http://www.w3.org/2001/XMLSchema#date> ;
    dcterms:modified "2008-01-14"^^<http://www.w3.org/2001/XMLSchema#date> ;
    a rdf:Property ;

And in http://dublincore.org/usage/terms/history/#hasPart-003 it says
(in HTML): that

    <http://dublincore.org/usage/terms/history/#hasPart-003>
dcterms:replaces
<http://dublincore.org/usage/terms/history/#hasPart-002> .

So here dcterms:hasPart hasVersion both #hasPart-003 and #hasPart-002
- but #hasPart-003 replaces #hasPart-002. This is the same as our
example of specializationOf in the primer -
http://www.w3.org/TR/prov-primer/#alternate-entities-and-specialization.

It would be strange to enforce prov:wasDerivedFrom for such
hierarchical relationships, the BBC frontpage is not (necessarily)
derived from the BBC frontpage today.



On http://dublincore.org/documents/usageguide/qualifiers.shtml we find:

> isVersionOf
>
> Label: Is Version Of
>
> Term description: The described resource is a version, edition, or adaptation of the referenced resource. Changes in version imply substantive changes in content rather than differences in format.
>
> Guidelines for creation of content:
>
> Use only in cases where the relationship expressed is at the content level. Relationships need not be close for the relationship to be relevant. "West Side Story" is a version of "Romeo and Juliet" and that may be important enough in the context of the resource description to be expressed using isVersionOf. The Broadway Show and the movie of "West Side Story" also relate at a similar level, but the video and DVD of the movie are more usefully expressed at the level of format, the content being essentially the same.
>
> See also isFormatOf.



However not all  dcterms:hasVersion / dcterms:isVersionOf
relationships express hierarchical specialization, and so I don't
recommend using prov:specializationOf as superproperty of
prov:isFormatOf.


More current usage and guideline for isVersionOf is provenance-related:

http://wiki.dublincore.org/index.php/User_Guide/Creating_Metadata#IsVersionOf

> This property describes the relationship between the described resource and another resource, that is a former version, edition or adaptation of the described resource (e.g. the described resource is the revision of a book, or another recording of a song, etc.). Another version implies changes in the content of a resource. For resources with different formats use isFormatOf. For the reciprocal statement use hasVersion.

As a compromise I therefore suggest instead to say that:

  prov:wasRevisionOf rdfs:subPropertyOf dct:isVersionOf

And equivalent for Table 5:

  prov:hadRevision rdfs:subPropertyOf dct:hasVersion




22) dct:hasFormat is also subproperty of prov:wasDerivedFrom

dct:hasFormat is defined as:
>  A related resource that is substantially the same as the pre-existing described resource, but in another format.

So the subject is pre-existing.

http://wiki.dublincore.org/index.php/User_Guide/Creating_Metadata#IsFormatOf
 has more:

> This property describes the relationship between the described resource and another resource, that is a former version of the described resource with the same intellectual content but presented in another format (e.g. the described resource is the microfilm version of a printed book, or the pdf version of a doc document). For intellectual changes between resources use isVersonOf. For the reciprocal statement use hasFormat.

So this is implying that the object has somewhat been formed from the subject.

Therefore dcterms:isFormatOf should be a subproperty of
prov:wasDerivedFrom - in addition to being a subproperty of
prov:alternateOf.

Equivalent for Table 5:

  dcterms:hasFormat rdfs:subPropertyOf prov:hadDerivation





23) dct:references should be subproperty of prov:wasInfluencedBy

dct:references is made a subproperty of prov:wasDerivedFrom, which
sounds very strong to me. I would use prov:wasInfluencedBy.

> Influence ◊ is the capacity of an entity, activity, or agent to have an effect on the character, development, or behavior of another by means of usage, start, end, generation, invalidation, communication, derivation, attribution, association, or delegation.

(We don't know the details of how the reference was used).

Equivalent for Table 5:

dct:isReferencedBy rdfs:subPropertyOf prov:influenced



24) justification for dct:source

> dct:source rdfs:subPropertyOf prov:wasDerivedFrom dct:source is defined as a "related resource from which the described resource is derived", which matches the notion of derivation in PROV-DM ("a transformation of an entity in another").

You need to justify why this is NOT an equivalent property. In
SKOS-terms I would call them a skos:closeMatch rather than a
skos:broadMatch; but in OWL/RDFS we don't have that luxury. I do agree
on the mapping you suggest - to make it consistent with the other
mappings. (with equivalent dct:isFormatOf would effectively become a
subproperty of dct:source, which might be odd in DCT). So the
justification should be something like:

> However, prov:wasDerivedFrom also covers broader derivations such as "an update of an entity resulting in a new one" which is not covered by dct:source.



25) PROV refinements does not include mapping for dct:rightsHolder

See #? above if this should be in or not.


26)

> Additional refinements of the PROV properties have been ommitted, since the direct mappings presented in Section 3.1 already define the relationship between both vocabularies.

What does this mean? Rephrase.


27)

> The mapping corresponds to the graph in Figure 1 (with small changes for creator and rightsHolder).


I don't understand this. Neither the mapping below nor Figure 1
describes rightsHolder. Figure 1 shows dct:publisher. Rephrase.


28)

> A creator is the agent in charge of the "Create" activity that generated a specialization of the entity ?document. The agent is assigned the role "creator".

Some use of <code> here would improve readability.



Note: I have not checked the syntax of the SPARQL CONSTRUCTs beyond
reading them.


29)

> In case of publication, a second specialization representing the entity before the publication is necessary:

Why is this necessary? If I write a blog post using Wordpress.com, and
I immediately click "Publish", then there is no "unpublished" entity.
Your argument would otherwise also potentially apply for contribution
- if I contributed to the entity, it must have been created before! In
both cases we would make unfounded assumptions about the contribution
and publication activities.

Remove the need for _:used_entity - you might instead leave a note
that "If it is known that the ?document existed before publication,
for instance as a draft, you may also add:

        _:used_entity a prov:Entity;
   prov:specializationOf ?document.

 _:activity prov:used _:used_entity .

 _:resulting_entity prov:wasDerivedFrom _:used_entity .


This also applies to dct:issued.


30) dct:dateCopyrighted should NOT have a used_entity

Copyright is usually something you have immediately, or are you
arguing there is always an uncopyrightable used-entity first? (Say an
empty document)?



(Note that I'm fine with the used-entity for the remaining cases)


31) dct:isReplacedBy/dct:replaces should be subproperty of prov:alternateOf

(and listed in Tables earlier)


32)

> However, the derivation relationship cannot always be applied between the original entities, because they could have existed before the replacement took place (for example, if a book replaces another in a catalog we cannot say that it was derived from it).

I agree - but then why does the query include:

 _:new_entity prov:wasDerivedFrom _:old_entity .


33) reosource -> resource

> Property used to describe that the current resource is required for supporting the function of another resource. This is not related the provenance of the reosource


34) dct:date

I think this could be given a complex mapping.

DCT says:

> A point or period of time associated with an event in the lifecycle of the resource.

So perhaps just saying there was an event:

CONSTRUCT{
         _:event a prov:InstantaneousEvent ;
             prov:atTime ?date .
 } WHERE {
  ?document dct:date ?date.
 }
 
However, as we don't know the nature of the association between the
?document and the ?date, this is a bit useless, and so if you think we
include this, it should have a note:

Note that the above inference would not generally be considered useful
due to the ambiguity of dct:date (we don't know how the entity is
related to the event), however the above rule is included here for
completeness.





-- 
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester

Received on Thursday, 11 April 2013 14:56:57 UTC