- From: Rufus Pollock <rufus.pollock@okfn.org>
- Date: Wed, 6 Aug 2014 16:00:49 +0100
- To: Peter Parslow <Peter.Parslow@ordnancesurvey.co.uk>
- Cc: Dan Brickley <danbri@google.com>, "public-csv-wg@w3.org" <public-csv-wg@w3.org>
- Message-ID: <CAKssCpPtwyi8nNvjpfO_6ofbLF9XxqfLpfzOqjfZjmC5qoOUmw@mail.gmail.com>
This is a really interesting question. I note this arose with respect to Tabular Data Package and JSON Table Schema and someone opened this specific issue: https://github.com/dataprotocols/dataprotocols/issues/97 The suggestion there was to add a specific field named "missing_value" which would define what was the missing value value/symbol. Rufus On 6 August 2014 13:45, Peter Parslow <Peter.Parslow@ordnancesurvey.co.uk> wrote: > Dan, > That looks like a 'yes'. > > I can add some scenarios, from our own CSV data products. > > This morning I was looking at "AddressBase Plus", so the specific use case > is a missing string value (although the string in question is actually an > identifier in a different dataset). There are many fields in an address > which are not always populated. For example, many (UK) addresses lie within > an electoral ward, but not all do - so sometimes the electoral ward column > is empty, indicating that this particular address does not lie in a ward. > The use case is that we would like to be explicit about that, rather than > risk it being interpreted as 'missing by accident'. The solution would > appear to be for us to create an 'out of range' value to populate the field > with ('none' springs to mind); the slight difficulty being that the > electoral ward codes are created by another agency - so we run the risk of > them creating a value of 'none' for some (no doubt good) reason! Especially > if we chose the same 'none' across all the potentially missing string > values (house name, street name, etc) > > My reference to GML is more generic, because we have a European standard > now for expressing addresses in GML, and GML includes a reasonably good > model for being explicit about what nil & absent values mean. > > Peter > -----Original Message----- > From: Dan Brickley [mailto:danbri@google.com] > Sent: 06 August 2014 11:25 > To: Peter Parslow > Cc: public-csv-wg@w3.org > Subject: Re: CSV on the web: question re null / missing values > > On 6 August 2014 11:16, Peter Parslow > <Peter.Parslow@ordnancesurvey.co.uk> wrote: > > Does this working group intend to publish anything (e.g. advice) on how > to handle null values in CSV data? Perhaps as part of the metadata work, > given that current usage varies. I would like to see some guidance covering: > > > > * 'the meaning of null' - i.e. recognition of the range of > > possibilities. OGC's gml:nilReasonType > > (http://www.opengeospatial.org/standards/gml) extends the idea of > > xsi:nil (http://www.w3.org/TR/xmlschema-1/#xsi_nil; > > http://www.w3.org/TR/2001/REC-xmlschema-0-20010502/#Nils) > > > > * Giving null values in columns of different data types > > > > * Any interaction with whether strings are quoted or not > > Hi Peter, > > Interesting point. I think this will come to the fore as we go deeper into > templates and mappings. Perhaps there are use cases that could capture more > detailed requirements than we now have. But > http://www.w3.org/TR/2014/WD-csvw-ucr-20140701/#R-MissingValueDefinition > does touch on the issue already: > > "R-MissingValueDefinitionAbility to declare a "missing value" token and, > optionally, a reason for the value to be missing > > Significant amounts of existing tabular text data include values such as > -999. Typically, these are outside the normal expected range of values and > are meant to infer that the value for that cell is missing. > Automated parsing of CSV files needs to recognise such missing value > tokens and behave accordingly. Furthermore, it is often useful for a data > publisher to declare why a value is missing; e.g. withheld or > aboveMeasurementRange > > Motivation: SurfaceTemperatureDatabank, OrganogramData, OpenSpendingData, > NetCdFcDl, PaloAltoTreeData and PlatformIntegrationUsingSTDF." > > At a minimum it could be useful to add the links you provide to any future > updates on the csvw-ucr doc. Do you have scenarios in mind that are not > captured in the above list of use cases? > > cheers, > > Dan > > > This email is only intended for the person to whom it is addressed and may > contain confidential information. If you have received this email in error, > please notify the sender and delete this email which must not be copied, > distributed or disclosed to any other person. > > Unless stated otherwise, the contents of this email are personal to the > writer and do not represent the official view of Ordnance Survey. Nor can > any contract be formed on Ordnance Survey's behalf via email. We reserve > the right to monitor emails and attachments without prior notice. > > Thank you for your cooperation. > > Ordnance Survey > Adanac Drive > Southampton SO16 0AS > Tel: 08456 050505 > http://www.ordnancesurvey.co.uk > -- *Rufus PollockFounder and President | skype: rufuspollock | @rufuspollock <https://twitter.com/rufuspollock>Open Knowledge <http://okfn.org/> - see how data can change the world**http://okfn.org/ <http://okfn.org/> | @okfn <http://twitter.com/OKFN> | Open Knowledge on Facebook <https://www.facebook.com/OKFNetwork> | Blog <http://blog.okfn.org/>* The Open Knowledge Foundation is a not-for-profit organisation. It is incorporated in England & Wales as a company limited by guarantee, with company number 05133759. VAT Registration № GB 984404989. Registered office address: Open Knowledge Foundation, St John’s Innovation Centre, Cowley Road, Cambridge, CB4 0WS, UK.
Received on Wednesday, 6 August 2014 15:01:19 UTC