RE: Note on caveats in statistical data

It seems to me that the mention of “an anomalous data point” in the transcript implies that they are interested to annotate down to the level of individual observations, for example, qb:Observation. 

 

So, they may need to look at a vocabulary like Data Cube to see how such annotations could be included. Maybe dqv:QualityAnnotation https://www.w3.org/TR/vocab-dqv/#dqv:QualityAnnotation could help, but that is defined on the level of dataset, not for individual observations, if I read it right.

 

The statistical people themselves are doing stuff around XKOS with Explanatory notes, see http://www.ddialliance.org/Specification/XKOS/1.0/OWL/xkos.html#note-ext. 

 

Makx.

 

 

 

From: Dan Brickley [mailto:danbri@google.com] 
Sent: 14 July 2017 01:05
To: public-dxwg-wg@w3.org
Subject: Note on caveats in statistical data

 

Hi. I thought https://www.youtube.com/watch?v=cLMbrzI5p6s might be of interest to the WG. It's a 30 second video from a chat today at Full Fact (UK fact checking charity), with Andy Dudfield from the UK's Office for National Statistics. Andy, Will Moy, Mevan Babakar and I discussed the importance of making sure that caveats of various kinds travel along with the different data format representations of statistical data. Full Fact have done some work in this direction and would be interested in conversations on how it might plug into standards (e.g. CSVW, DCAT, Schema.org etc). 

 

I've also just transcribed the video, so here's the text version:

 

(Will Moy) "[re statistical data]... full of numbers, ... what I want to go along with that is a list of things I need to know about those numbers in order to be able to re-use them. And I want those to be organized so instead of just getting a long list of footnotes, those footnotes are classified into the type of caveat it is. So we did a piece of work which is what kind of caveats exist. So - is it an anomalous data point or is it that we changed the methodology or whatever, ... classify it that way, in a machine readable way using a standardized code list so a computer has a reasonable chance of being able to reason about what those numbers can do."

 

I'll share more details of this work as I find out more but it seemed worth making a quick note first.

 

cheers,

 

Dan

Received on Friday, 14 July 2017 07:14:04 UTC