Re: DQV, DAQ and Data Cube graphs


On Wed, Sep 2, 2015 at 7:23 PM, Riccardo Albertoni <> wrote:
> You are right,  we don't need to keep the daq:QualityGraph as a
> rdfg:Graph, but we still need a subclass of qb:DataSet as  range of the
> property  qb:dataSet
> to facilitate the visualisation of the data as RDF cube.

I think the alignment with RDF Data Cube was one of the really good feature
of DAQ and it would be good to see that it's preserved in DQV. It's will be
very helpful for visualizing the evaluation of the quality of a dataset
overtime or comparison of the effects on quality using different repair
techniques, etc. as Jeremy explains in the use cases section of their paper.

> Concerning  dqv:QualityMetadata, the  outer RDFG:graph in the DQV schema,
> it basically groups all the quality statements for a datasets in a graph,
> so that  we can easily track the  provenance of quality statements;
> as far as I remember, tracking the provenance of the quality statement is
> one of the requirement on which we agree,  and I like that DQV is providing
> an explicit guidance on it, but   by deleting the outer  graph, I think it
>  gets lost.
>  Before deleting the outer rdfg:graph,   I wonder if you/we  have an
> alternative design pattern, not relying  on RDFG:Graph but  keeping track
> of the "quality statement"  provenance.  Any proposal?

I tend to agree with Christophe about having flexibility of tracking the
provenance at different granularity levels. I also agree with Ricardo that
it will be useful for the readers if DQV provides guidance on how
provenance information can be / should be published with quality metadata
rather than leaving it completely up to the reader.

About the granularity, I might want to track the provenance of
dqv:QualityMetadata graph as a whole

:myQualityMetadata a dqv:QualityMetadata, prov:Entity;
     prov:wasAttributedTo :qualityCheckerAgent ;
     prov:generatedAtTime "2015-05-27T02:52:02Z"^^xsd:dateTime ;
     prov:wasGeneratedBy :qualityCheckingActivity .

or for each individual dqv:QualityMeasure, dqv:QualityAnnotation (or may be
a subset of them) in a more fine grained manner as they might have been
generated by different activities and by different agents.

:measure1 a dqv:QualityMeasure ;
  daq:computedOn :myDatasetDistribution ;
  daq:metric :cvsAvailabilityMetric ;
  daq:value "1.0"^^xsd:double ;
  prov:wasAttributedTo :aavailabilityMonitorAgent ;
  prov:generatedAtTime "2015-05-27T02:52:02Z"^^xsd:dateTime ;
  prov:wasGeneratedBy :availabilityMonitoringActivity .

So in my opinion dqv:QualityMeasure, dqv:QualityAnnotation,
dqv:QualityMetadata are all prov entities that can have their provenance
information. Though this is not explicitly mentioned in the document, I
don't think this conflicts with anything neither.

If we don't want to rely on RDFG:Graph to allow implementers more
flexibility, another option may be to explicitly relate a set of quality
measures and annotations to a dqv:QualityMetadata using some RDF relation
defined in DQV ( similar to dqv:hasQualityMeasure or db:dataset).

dqv:contains ( Domain: dqv:QualityMetadata -> Range: [ owl:unionOf (
dqv:QualityMeasure  dqv:QualityAnnotation) ] )

Best Regards,

Received on Thursday, 3 September 2015 09:30:52 UTC