Comments on "Dublin Core to PROV Mapping" - part 1 of 2 from Thomas Baker on 2013-03-30 (public-prov-comments@w3.org from March 2013)

From: Thomas Baker <tom@tombaker.org>
Date: Sat, 30 Mar 2013 14:03:19 -0400
To: public-prov-comments@w3.org
Cc: Provenance Task Group <dc-provenance@jiscmail.ac.uk>
Message-ID: <20130330180319.GA38340@julius>
Daniel, Kai, other contributors,

Attached is my review of "Dublin Core to PROV Mapping: W3C Working Draft 12
March 2013" [1].  Bravo to the editors and contributors for a complex and solid
piece of work!

My comments are divided into two postings.  This posting addresses:

1. Status of the Turtle representations and the subclasses they declare
2. Various points of substance
3. Minor editorial points

The next posting will continue with:

4. Issues in the Introduction re: Dublin Core and "DC Terms"

I reviewed the Mapping primarily from the standpoint of Dublin Core. Though I
am currently the CIO of DCMI, my review has not gone through DCMI process so
should be considered my opinion.  I have also reviewed aspects of the Mapping
from the standpoint of one who has been involved in various contexts with W3C
process (e.g., Point 1 below).

What I am not qualified to comment on in much detail are aspects related to the
PROV model, which I have not studied in detail.  There were one or two places,
flagged below, where I thought that deeper knowledge of the model was really
necessary for understanding particular points.  However, it speaks well for the
authors that I felt I could follow it without extensive knowledge of PROV.  I
like it when the authors suggest that the Mapping could facilitate PROV
adoption by allowing users to use Dublin Core statements as a starting point
for generating more complex PROV representations -- a very good idea and one
that could inform a very instructive tutorial or primer.

Tom

[1] http://www.w3.org/TR/2013/WD-prov-dc-20130312/ 

======================================================================
1. Status of the Turtle representations and the subclasses they declare

   The Turtle representations of the mappings are buried in anchors to
   the hyperlink "here" in the Abstract but are not further mentioned.
   Generally speaking, the use of "here" as a hyperlink is not ideal in
   specifications such as this, which many people may read in the form
   of a printout, or offline, perhaps in Instapaper on an iPad.

   I suggest:
   --  Create entries for the Turtle representations in the References
       section [3], then cite them in the specification.

   --  Discuss the Turtle representations somewhere in the specification
       besides just the Abstract, and add some explanation clarifying their
       status.  Do they fall under a W3C namespace policy?  Are they linked to
       WD-prov-dc such that any future revisions in the Turtle representations
       could only be undertaken in the context of a revision of WD-prov-dc?  
       Are they provided merely as a convenience for readers, or do the editors
       intend them to be used (and how)?  I do not think a long text is
       required, but it would be good to clarify for the reader what these are
       and how they fit into W3C publication and maintenace processes, and to
       make their URIs visible in References.

    -- In Section 3.2, I am puzzled about the status of "subclasses" such as
       prov:Publish.  I see that these subclass declarations in Turtle are 
       mirrored in [2], but I see no referece to prov:Publish in PROV-O.
       It is unclear, in other words, whether:

            To properly reflect the meaning of the Dublin Core terms, more specific
            subclasses are needed: 

       means

            more specific subclasses would be needed (but haven't been created)

       or 

            more specific subclasses have been created

        If the latter, then the text would need to point to PROV-O.  If the
        former, then it would be doubly important to clarify the status of the
        Turtle representations. Does [2] intend to encourage people to use
        prov:Publish in their data?

    [1] http://www.w3.org/ns/prov-dc-directmappings.ttl
    [2] http://www.w3.org/ns/prov-dc-refinements.ttl
    [3] http://www.w3.org/TR/2013/WD-prov-dc-20130312/#informative-references

----------------------------------------------------------------------
2. Various points of substance

--  1.1 Namespaces (and the term "namespace")

    The term "namespace" is used a bit loosely here.  It is worth noting that
    the current draft RDF 1.1 Concepts and Abstract Syntax spec, while still
    just a Working Draft, concludes that [1]:
    
        The term "namespace" on its own does not have a well-defined meaning in
        the context of RDF, but is sometimes informally used to mean "namespace
        IRI" or "RDF vocabulary".

    I suggest changing the name of the section and tweaking a few things:
        
        1.1 Namespace URIs

        The namespace URIs used in this document can be seen in Table 2.

        Table 2: Namespace URIs used in the document 
        
        prefix   Namespace IRI                           Used for
        owl      <http://www.w3.org/2002/07/owl#>        The OWL vocabulary [OWL2-OVERVIEW].
        rdfs     <http://www.w3.org/2000/01/rdf-schema#> The RDFS vocabulary [RDFS].
        prov     <http://www.w3.org/ns/prov#>            The PROV vocabulary [PROV-DM].
        dct      <http://purl.org/dc/terms/>             The DCMI /terms/ vocabulary [DCTERMS].
        ex       <http://example.org>                    Application-dependent URIs. Used in examples.
    
    [1] https://dvcs.w3.org/hg/rdf/raw-file/default/rdf-concepts/index.html#vocabularies

--  3.3.2 
    The sentence:

        It is important to note that since the range for dates in Dublin Core is a
        rdfs:Literal and xsd:dateTime for the prov:atTime property, the mapping is
        only valid for those literals that are xsd:dateTime. 

    is not very precise.  Perhaps you mean something like:

        It is important to note that since the range for DC date properties is
        rdfs:Literal, and the range of the prov:atTime property is the class
        of literals with the datatype xsd:dateTime, the mapping is only valid
        for those literals that have (or could be assigned?) the datatype
        xsd:dateTime. 

    ...assuming that "range... is the class of literals with the datatype
    xsd:dateTime" is a correct interpretation (I haven't checked the other 
    specs).

--  3.3.3
    The sentence:

        In Dublin Core, most of the properties relating entities to other entities
        don't describe the involvement of a specific activity (e.g., dct:format,
        dct:source or isVersionOf).

    is awkwardly worded.  Do you perhaps mean:

        In Dublin Core, most of the properties relating entities to other entities
        do not imply activities related to provenance (e.g., dct:format,
        dct:source or isVersionOf).

--  3.3.3.1
    I found the following sentence hard to understand:

        The replacement is the result of a "search and replace" Activity, which
        used a specialization of the replaced entity (_:old_entity) and produced a
        specialization of the replacement (_:new_entity). 

    ...but I do not know the PROV model well enough to propose a clearer
    text.

--  3.4 Cleanup

    I wonder if "cleanup" is the best heading for this section.  After using
    SPARQL, as described in the previous sections, one ends up with a PROV
    graph that has blank nodes for entities, and the process of assigning
    identifiers to those blank nodes could be thought of as "cleanup".  So far,
    so good.

    What the "suggestions" then discuss, however, are not methods for cleaning
    up an existing generated graph, but different templates for generating
    _new_ and _different_ PROV graphs from the same DC statements.  As I read
    it, this section has more to do with different possible ways to generate
    graphs, starting with somewhat different assumptions (related to different
    possible ways to model things using PROV), and resulting in different
    patterns.  If my reading is correct, then I would suggest saying this more
    clearly in the introduction to the section and giving the section a more
    specific name, such as "Generating PROV graphs using different templates".

--  Table 6 - dct:references

    For most properties, the commentary says they have been "excluded"
    or "left out" of the mapping.  For dct:references, however, the text says
    that dct:references "has been dropped from the mapping".  This wording 
    makes it sound like there was an earlier, published mapping from which
    this was dropped -- more like a change note for a specification than part 
    of the specification itself.  I suggest using "excluded" or "left out".

--   Reference in "Reference" section
     Currently reads:
        [DCTERMS]
            Dublin Core Terms Vocabulary. 8 December 2010. URL: http://dublincore.org/documents/dcmi-terms/ 

     Should read:
        [DCTERMS]
            DCMI Metadata Terms. 8 December 2010. URL: http://dublincore.org/documents/dcmi-terms/ 

--  In the sentence:

        For example, when mapping dates only unqualified properties can be extracted,

    I was unsure what you mean by "unqualified".

======================================================================
3. Minor editorial points

--  s/don't/do not/ (3.3.3), also search/replace "couldn't", "doesn't", and other contractions

--  "cleanup" and "clean-up" are used inconsistently

--  s/refering/referring/

--  2.1 Provenance in Dublin Core: Section "Descriptive Terms": replace ", etc."
    with a full stop because the sentence already starts with "Some examples".

--  3.3.  Change "We divide the queries in different categories" => "into different
    categories".

-- 
Tom Baker <tom@tombaker.org>
Received on Saturday, 30 March 2013 18:03:58 UTC