Data Cube validation: Issue and proposed change

# Summary

The Candidate Recommendation for the RDF Data Cube Vocabulary [1] 
defines a normalization algorithm and a set of integrity checking rules. 
These rules are intended to guide Data Cube implementers and enable 
mechanical checking for well-formed cubes. They are marked as "At Risk" 
in the Candidate Recommendation.

A case has been noted [2] where a data set can pass the defined 
integrity checks but where information that would normally be expected 
is missing. An integrity checker that implemented just the minimal 
published algorithm would miss this error case.

The specification states that "processors MAY apply full RDFS closure in 
place of the update operation defined [in the spec]". An integrity 
checker which implemented this MAY clause would detect this error case.

The WG proposes to modify the normalization algorithm enable to detect 
this case.

# Details

Each value (instance of qb:Observation) in a cube should define a cube 
(instance qb:DataSet) that it is a part of. This is checked by IC-1 [3].

Each cube (instance of qb:DataSet) should define its structure 
(qb:structure value). This is checked by IC-2 [4].

However, IC-2 will only detect cubes which have been explicitly declared 
as instances of qb:DataSet. So for example:

ex:obs1 a qb:Observation;
     # useful data omitted
     qb:dataSet ex:qb-mistake .

ex:qb  a qb:DataSet;
     rdfs:label "my intended data set";
     qb:structure ex:dsd .

will validate.

The omission is that the closure rules included in the normalization 
algorithm [5] fail to infer the rdf:type of ex:qb-mistake, even though 
under full RDFS inference this would be inferred.

# Proposed resolution

The WG proposes to modify the specification to address this oversight.

This would be done by replacing closure rule

       ?o  rdf:type qb:Observation .
   } WHERE {
       ?o qb:dataSet [] .


       ?o  rdf:type qb:Observation .
       ?ds rdf:type qb:DataSet .
   } WHERE {
       ?o qb:dataSet ?ds .

A diff of an editor's draft showing the proposed change is at [6]. The 
only change is to section 10.1 and to the change history.

This change does not affect the intent of the integrity checking, does 
not invalidate any data publications and is unlikely to affect any data 
cube validator implementations. For this reason the WG believes such a 
change could be carried out as part of requesting transition to PR and 
does not require a restart of a LC/CR cycle.

We are posting this note on the public comments group to enable Data 
Cube implementers to note the proposal and comment if appropriate.



Received on Thursday, 18 July 2013 16:03:35 UTC