- From: Phil Archer <phila@w3.org>
- Date: Fri, 14 Jul 2017 09:18:54 +0100
- To: Dan Brickley <danbri@google.com>, Makx Dekkers <mail@makxdekkers.com>
- Cc: public-dxwg-wg@w3.org, Bill Roberts <bill@swirrl.com>
+ Bill Roberts With my faded W3C hat: Adding Bill to this thread. All being well, he'll be working on a Statistical Data on the Web BP doc later this year, I believe working with ONS through an EU project. See the proposed charter for the continuation of the W3C/OGC collaboration [1]. Without any hat: The idea of crowd-sourced data scares people who sit on authoritative data since: - it's a threat to their business of selling authoritative data; - it might actually be better than theirs, which makes them look bad; and - the one relevant here: - they want a way to say "hang on, that's not right, this is, and here's how we know." So this discussion is relevant to that. And yes, a way to point to data points at the item level and offer a correction or at least an annotation would be important. Phil [1] http://w3c.github.io/sdw/jwoc/ On 14/07/2017 08:31, Dan Brickley wrote: > +cc Will fyi (who may not be able to post to this list but saves me > relaying in one direction) > > On 14 July 2017 at 08:13, Makx Dekkers <mail@makxdekkers.com> wrote: > >> It seems to me that the mention of “an anomalous data point” in the >> transcript implies that they are interested to annotate down to the level >> of individual observations, for example, qb:Observation. >> > > Yes > > >> So, they may need to look at a vocabulary like Data Cube to see how such >> annotations could be included. Maybe dqv:QualityAnnotation >> https://www.w3.org/TR/vocab-dqv/#dqv:QualityAnnotation could help, but >> that is defined on the level of dataset, not for individual observations, >> if I read it right. >> > > Yes - fine grained but also I believe sometimes applicable across an entire > time series especially when measuring methodologies, rules or associated > technology/instrumentation shift with time. The example from Will that > stuck with me was real world events such as > https://en.wikipedia.org/wiki/The_Shipman_Inquiry can, especially when > considering in aggregate e.g. mean, look beautiful in an interactive data > visualization but tell a fundamentally misleading story unless there is a > caveat/footnote. Previously I had tended to think of caveats in terms of > the more intrinsic properties of the dataset and workflow and had missed > the (rather open-ended) important of also noting relevant real world > aspects. Specialist journalists and researchers may be aware of these > "blips" and historical events but the desire is for that knowledge to be > surfaced and travel along with the raw data, building confidence in its > reusability and in people's ability to draw and defend actionable > conclusions from it. (Will I hope will correct me if I'm putting words into > his mouth). > > >> The statistical people themselves are doing stuff around XKOS with >> Explanatory notes, see http://www.ddialliance.org/ >> Specification/XKOS/1.0/OWL/xkos.html#note-ext. >> > > Yes, that looks like a possible carrier for this sort of information. I > don't see there a specific code list for the distinctions Will alludes to > (real world anomalies versus data recording anomalies etc etc.) but it > provides a sensible SKOS-based representation that we could use to capture > the list in a way that would be re-usable in Data Cube, CSVW et al. Would > this WG or a Community Group be a good place to turn such a list into this > kind of representation? > > Dan > > > >> >> >> Makx. >> >> >> >> >> >> >> >> *From:* Dan Brickley [mailto:danbri@google.com] >> *Sent:* 14 July 2017 01:05 >> *To:* public-dxwg-wg@w3.org >> *Subject:* Note on caveats in statistical data >> >> >> >> Hi. I thought https://www.youtube.com/watch?v=cLMbrzI5p6s might be of >> interest to the WG. It's a 30 second video from a chat today at Full Fact >> (UK fact checking charity), with Andy Dudfield from the UK's Office for >> National Statistics. Andy, Will Moy, Mevan Babakar and I discussed the >> importance of making sure that caveats of various kinds travel along with >> the different data format representations of statistical data. Full Fact >> have done some work in this direction and would be interested in >> conversations on how it might plug into standards (e.g. CSVW, DCAT, >> Schema.org etc). >> >> >> >> I've also just transcribed the video, so here's the text version: >> >> >> >> (Will Moy) "[re statistical data]... full of numbers, ... what I want to >> go along with that is a list of things I need to know about those numbers >> in order to be able to re-use them. And I want those to be organized so >> instead of just getting a long list of footnotes, those footnotes are >> classified into the type of caveat it is. So we did a piece of work which >> is what kind of caveats exist. So - is it an anomalous data point or is >> it that we changed the methodology or whatever, ... classify it that way, >> in a machine readable way using a standardized code list so a computer has >> a reasonable chance of being able to reason about what those numbers can >> do." >> >> >> >> I'll share more details of this work as I find out more but it seemed >> worth making a quick note first. >> >> >> >> cheers, >> >> >> >> Dan >> >
Received on Friday, 14 July 2017 08:19:08 UTC