W3C home > Mailing lists > Public > public-csv-wg@w3.org > August 2014

RE: CSV on the web: question re null / missing values

From: Peter Parslow <Peter.Parslow@ordnancesurvey.co.uk>
Date: Wed, 6 Aug 2014 12:45:52 +0000
To: Dan Brickley <danbri@google.com>
CC: "public-csv-wg@w3.org" <public-csv-wg@w3.org>
Message-ID: <0461228843A12A4DBE1DC4E9B451A9AB65F3FDD8@WP113.ordsvy.gov.uk>
That looks like a 'yes'.

I can add some scenarios, from our own CSV data products.

This morning I was looking at "AddressBase Plus", so the specific use case is a missing string value (although the string in question is actually an identifier in a different dataset). There are many fields in an address which are not always populated. For example, many (UK) addresses lie within an electoral ward, but not all do - so sometimes the electoral ward column is empty, indicating that this particular address does not lie in a ward. The use case is that we would like to be explicit about that, rather than risk it being interpreted as 'missing by accident'. The solution would appear to be for us to create an 'out of range' value to populate the field with ('none' springs to mind); the slight difficulty being that the electoral ward codes are created by another agency - so we run the risk of them creating a value of 'none' for some (no doubt good) reason! Especially if we chose the same 'none' across all the potentially missing string values (house name, street name, etc)

My reference to GML is more generic, because we have a European standard now for expressing addresses in GML, and GML includes a reasonably good model for being explicit about what nil & absent values mean.

-----Original Message-----
From: Dan Brickley [mailto:danbri@google.com]
Sent: 06 August 2014 11:25
To: Peter Parslow
Cc: public-csv-wg@w3.org
Subject: Re: CSV on the web: question re null / missing values

On 6 August 2014 11:16, Peter Parslow
<Peter.Parslow@ordnancesurvey.co.uk> wrote:
> Does this working group intend to publish anything (e.g. advice) on how to handle null values in CSV data? Perhaps as part of the metadata work, given that current usage varies. I would like to see some guidance covering:
> * 'the meaning of null' - i.e. recognition of the range of
> possibilities. OGC's gml:nilReasonType
> (http://www.opengeospatial.org/standards/gml) extends the idea of
> xsi:nil (http://www.w3.org/TR/xmlschema-1/#xsi_nil;

> http://www.w3.org/TR/2001/REC-xmlschema-0-20010502/#Nils)
> * Giving null values in columns of different data types
> * Any interaction with whether strings are quoted or not

Hi Peter,

Interesting point. I think this will come to the fore as we go deeper into templates and mappings. Perhaps there are use cases that could capture more detailed requirements than we now have. But http://www.w3.org/TR/2014/WD-csvw-ucr-20140701/#R-MissingValueDefinition

does touch on the issue already:

"R-MissingValueDefinitionAbility to declare a "missing value" token and, optionally, a reason for the value to be missing

Significant amounts of existing tabular text data include values such as -999. Typically, these are outside the normal expected range of values and are meant to infer that the value for that cell is missing.
Automated parsing of CSV files needs to recognise such missing value tokens and behave accordingly. Furthermore, it is often useful for a data publisher to declare why a value is missing; e.g. withheld or aboveMeasurementRange

Motivation: SurfaceTemperatureDatabank, OrganogramData, OpenSpendingData, NetCdFcDl, PaloAltoTreeData and PlatformIntegrationUsingSTDF."

At a minimum it could be useful to add the links you provide to any future updates on the csvw-ucr doc. Do you have scenarios in mind that are not captured in the above list of use cases?



This email is only intended for the person to whom it is addressed and may contain confidential information. If you have received this email in error, please notify the sender and delete this email which must not be copied, distributed or disclosed to any other person.

Unless stated otherwise, the contents of this email are personal to the writer and do not represent the official view of Ordnance Survey. Nor can any contract be formed on Ordnance Survey's behalf via email. We reserve the right to monitor emails and attachments without prior notice.

Thank you for your cooperation.

Ordnance Survey
Adanac Drive
Southampton SO16 0AS
Tel: 08456 050505

Received on Wednesday, 6 August 2014 12:46:23 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:21:41 UTC