W3C home > Mailing lists > Public > public-csv-wg@w3.org > August 2014

Re: CSV on the web: question re null / missing values

From: Dan Brickley <danbri@google.com>
Date: Wed, 6 Aug 2014 11:25:25 +0100
Message-ID: <CAK-qy=6DGKY5-3ng3gw6WHtTVzz3sSC3ZSThWiu42DqQYmibAQ@mail.gmail.com>
To: Peter Parslow <Peter.Parslow@ordnancesurvey.co.uk>
Cc: "public-csv-wg@w3.org" <public-csv-wg@w3.org>
On 6 August 2014 11:16, Peter Parslow
<Peter.Parslow@ordnancesurvey.co.uk> wrote:
> Does this working group intend to publish anything (e.g. advice) on how to handle null values in CSV data? Perhaps as part of the metadata work, given that current usage varies. I would like to see some guidance covering:
>
> * 'the meaning of null' - i.e. recognition of the range of possibilities. OGC's gml:nilReasonType (http://www.opengeospatial.org/standards/gml) extends the idea of xsi:nil (http://www.w3.org/TR/xmlschema-1/#xsi_nil; http://www.w3.org/TR/2001/REC-xmlschema-0-20010502/#Nils)
>
> * Giving null values in columns of different data types
>
> * Any interaction with whether strings are quoted or not

Hi Peter,

Interesting point. I think this will come to the fore as we go deeper
into templates and mappings. Perhaps there are use cases that could
capture more detailed requirements than we now have. But
http://www.w3.org/TR/2014/WD-csvw-ucr-20140701/#R-MissingValueDefinition
does touch on the issue already:

"R-MissingValueDefinitionAbility to declare a "missing value" token
and, optionally, a reason for the value to be missing

Significant amounts of existing tabular text data include values such
as -999. Typically, these are outside the normal expected range of
values and are meant to infer that the value for that cell is missing.
Automated parsing of CSV files needs to recognise such missing value
tokens and behave accordingly. Furthermore, it is often useful for a
data publisher to declare why a value is missing; e.g. withheld or
aboveMeasurementRange

Motivation: SurfaceTemperatureDatabank, OrganogramData,
OpenSpendingData, NetCdFcDl, PaloAltoTreeData and
PlatformIntegrationUsingSTDF."

At a minimum it could be useful to add the links you provide to any
future updates on the csvw-ucr doc. Do you have scenarios in mind that
are not captured in the above list of use cases?

cheers,

Dan
Received on Wednesday, 6 August 2014 10:25:52 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:21:41 UTC