W3C home > Mailing lists > Public > public-csv-wg@w3.org > June 2014

Re: What to do when "primary key" cell values are blank

From: Christopher Gutteridge <cjg@ecs.soton.ac.uk>
Date: Fri, 06 Jun 2014 10:53:51 +0100
Message-ID: <EMEW3|aef47df880963d74608f748e94dec22eq55Arr03cjg|ecs.soton.ac.uk|53918FAF.9040108@ecs.soton.ac.uk>
To: Andy Seaborne <andy@apache.org>, public-csv-wg@w3.org
A blank/empty cell could also be interpreted as a value of "" rather 
than NULL, but that's probably asking for bother.

What I have encountered in Excel reports from various business systems 
is for an ID to be only listed on the first row eg.

123    foo    1999
        bar    2000
        baz    2001
124    xyzzy  1975
        plugh  1669


For the toolchain we built for ourselves, we had the option set a flag 
on a column that indicated that nulls should be interpreted as "ditto".

On 06/06/2014 10:23, Andy Seaborne wrote:
> On 06/06/14 09:53, Tandy, Jeremy wrote:
>> Hi - when putting together Use Case #24 - Expressing a hierarchy 
>> within occupational listings [1] I was considering how primary key 
>> behaviour might work. In this use case, there are four different 
>> types of entity described in a single CSV file. I inferred that we 
>> might apply four different templates to pull out the relevant 
>> contents and transform into RDF. A given row describes _one of_ the 
>> types of entity, meaning that the primary key column asserted, say, 
>> for extracting "SOC Major Group" concepts will often be blank.
>> I have stated in the use case that:
>>> Where the value in the designated primary key column is blank, the 
>>> row is ignored.
>> I have also added this constraint to the primary key requirement [2].
>> Please advise is this is inappropriate!
> We use template conversion - we often run multiple templates on the 
> same CSV, essentially extracting different kinds of entity on each 
> pass. "The" primary key is different in each pass.  The note in 
> R-PrimaryKey does not meet our experiences.
> JeniT's condition extract is an example where it might be done as a 
> pass to generate the skos:broader separately from the "code rdfs:label 
> ....".
> "Primary" is being overloaded between uniquely identifying a row 
> (structural to CSV files), and uniquely identifying an entity 
> (modelling).  In denormalised data, entities might get repeated on 
> different rows.
>     Andy
>> Regards, Jeremy
>> [1] 
>> http://w3c.github.io/csvw/use-cases-and-requirements/#UC-ExpressingHierarchyWithinOccupationalListings
>> [2] http://w3c.github.io/csvw/use-cases-and-requirements/#R-PrimaryKey

Christopher Gutteridge -- http://users.ecs.soton.ac.uk/cjg

University of Southampton Open Data Service: http://data.southampton.ac.uk/
You should read the ECS Web Team blog: http://blogs.ecs.soton.ac.uk/webteam/
Received on Friday, 6 June 2014 09:54:46 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:21:40 UTC