RE: Note on caveats in statistical data from Makx Dekkers on 2017-07-15 (public-dxwg-wg@w3.org from July 2017)

From: Makx Dekkers <mail@makxdekkers.com>
Date: Sat, 15 Jul 2017 09:37:00 +0200
To: "'Dan Brickley'" <danbri@google.com>, "'Riccardo Albertoni'" <albertoni@ge.imati.cnr.it>
Cc: "'Will Moy'" <william.moy@fullfact.org>, "'Antoine Isaac'" <aisaac@few.vu.nl>, "'Dataset Exchange Working Group'" <public-dxwg-wg@w3.org>
Message-ID: <000001d2fd3d$26ca4bf0$745ee3d0$@makxdekkers.com>
Dan,

 

This is indeed interesting. 

 

However, I don’t feel anywhere near competent to review a list of types of caveats for domain-specific applications. 

 

Would a good way forward be that we as a group think about how a general need to identify additional characteristics of individual data points, slices and datasets can be satisfied? One way could be to define one (or more) properties (probably in DQV as Riccardo suggests) that link to controlled vocabularies that can be developed and maintained within application domains. 

 

Makx.

 

 

From: Dan Brickley [mailto:danbri@google.com] 
Sent: 14 July 2017 15:51
To: Riccardo Albertoni <albertoni@ge.imati.cnr.it>
Cc: Will Moy <william.moy@fullfact.org>; Antoine Isaac <aisaac@few.vu.nl>; Makx Dekkers <mail@makxdekkers.com>; Dataset Exchange Working Group <public-dxwg-wg@w3.org>
Subject: Re: Note on caveats in statistical data

 

 

 

On 14 Jul 2017 11:53 am, "Riccardo Albertoni" <albertoni@ge.imati.cnr.it <mailto:albertoni@ge.imati.cnr.it> > wrote:

Dear Makx, Will, Dan and All

 

I agree with Makx,  the DQV[1] could be handy to add caveats in such a context.

I think that the granularity of described resources is no real barrier to DQV adoption,  in case the DQV makes sense in the Will Moy's scenario.

The textual definition of dqv:QualityAnnotation [2] refers to datasets and distributions because they were the primary targets in the W3C DWBP group.  However,  we have deliberately chosen to leave DQV open for being reused with anything else (e.g., we haven't  imposed any formal constraints to say that annotated objects had to be instances of dcat:Dataset/Distribution). 

 

Inreresting! Here are some more details from Full Fact, which might be enough to try out that idea.

 

https://fullfact.org/blog/2015/aug/typology-caveats/

 

Details are in a (pretty SKOS-like) spreadsheet.

 

https://fullfact.org/media/redactor/Typology.xlsx

 

...including patial SDMX mappings - I haven't figured out yet what that might mean for a W3C Data Cube representation

 

Dan

 

 

 

Cheers, 

Riccardo 

 

[1] https://www.w3.org/TR/vocab-dqv/

[2] https://www.w3.org/TR/vocab-dqv/#dqv:QualityAnnotation

[3] https://www.w3.org/TR/annotation-vocab/#annotation

 

On 14 July 2017 at 09:13, Makx Dekkers <mail@makxdekkers.com <mailto:mail@makxdekkers.com> > wrote:

It seems to me that the mention of “an anomalous data point” in the transcript implies that they are interested to annotate down to the level of individual observations, for example, qb:Observation. 

 

So, they may need to look at a vocabulary like Data Cube to see how such annotations could be included. Maybe dqv:QualityAnnotation https://www.w3.org/TR/vocab-dqv/#dqv:QualityAnnotation could help, but that is defined on the level of dataset, not for individual observations, if I read it right.

 

The statistical people themselves are doing stuff around XKOS with Explanatory notes, see http://www.ddialliance.org/Specification/XKOS/1.0/OWL/xkos.html#note-ext. 

 

Makx.

 

 

 

From: Dan Brickley [mailto:danbri@google.com <mailto:danbri@google.com> ] 
Sent: 14 July 2017 01:05
To: public-dxwg-wg@w3.org <mailto:public-dxwg-wg@w3.org> 
Subject: Note on caveats in statistical data

 

Hi. I thought https://www.youtube.com/watch?v=cLMbrzI5p6s might be of interest to the WG. It's a 30 second video from a chat today at Full Fact (UK fact checking charity), with Andy Dudfield from the UK's Office for National Statistics. Andy, Will Moy, Mevan Babakar and I discussed the importance of making sure that caveats of various kinds travel along with the different data format representations of statistical data. Full Fact have done some work in this direction and would be interested in conversations on how it might plug into standards (e.g. CSVW, DCAT, Schema.org etc). 

 

I've also just transcribed the video, so here's the text version:

 

(Will Moy) "[re statistical data]... full of numbers, ... what I want to go along with that is a list of things I need to know about those numbers in order to be able to re-use them. And I want those to be organized so instead of just getting a long list of footnotes, those footnotes are classified into the type of caveat it is. So we did a piece of work which is what kind of caveats exist. So - is it an anomalous data point or is it that we changed the methodology or whatever, ... classify it that way, in a machine readable way using a standardized code list so a computer has a reasonable chance of being able to reason about what those numbers can do."

 

I'll share more details of this work as I find out more but it seemed worth making a quick note first.

 

cheers,

 

Dan





 

-- 

----------------------------------------------------------------------------
Riccardo Albertoni
Istituto per la Matematica Applicata e Tecnologie Informatiche "Enrico Magenes"
Consiglio Nazionale delle Ricerche
via de Marini 6 - 16149 GENOVA - ITALIA
tel. +39-010-6475624 <tel:+39%20010%20647%205624>  - fax +39-010-6475660 <tel:+39%20010%20647%205660> 
e-mail:  <mailto:Riccardo.Albertoni@ge.imati.cnr.it> Riccardo.Albertoni@ge.imati.cnr.it
Skype: callto://riccardoalbertoni/
LinkedIn:  <http://www.linkedin.com/in/riccardoalbertoni> http://www.linkedin.com/in/riccardoalbertoni
www: http://www.imati.cnr.it/

http://pers.ge.imati.cnr.it/albertoni/PersonalPage/albertoni.html

FOAF:http://purl.oclc.org/NET/RiccardoAlbertoni/foaf
Received on Saturday, 15 July 2017 07:37:33 UTC