RE: What to do when "primary key" cell values are blank from Tandy, Jeremy on 2014-06-06 (public-csv-wg@w3.org from June 2014)

From: Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk>
Date: Fri, 6 Jun 2014 10:43:51 +0000
To: Christopher Gutteridge <cjg@ecs.soton.ac.uk>, Andy Seaborne <andy@apache.org>, "public-csv-wg@w3.org" <public-csv-wg@w3.org>
Message-ID: <2624871D9A05174691BD59F8EFD68AE208845581@EXXCMPD1DAG3.cmpd1.metoffice.gov.uk>

Hi - I've seen this "blank" behaviour too. In fact, you can see it in the Standard Occupational Classification (SOC) referenced within 2.24 Use Case #24 - Expressing a hierarchy within occupational listings [1].

I concluded that "ditto" wasn't always a safe interpretation in the case where blanks occur in multiple columns. It's probably best for people to build a "custom" workflow, like yourselves, to "fill in the blanks" rather than us trying to generalize how this might work.

Jeremy


[1] http://w3c.github.io/csvw/use-cases-and-requirements/#UC-ExpressingHierarchyWithinOccupationalListings

> -----Original Message-----
> From: Christopher Gutteridge [mailto:cjg@ecs.soton.ac.uk]
> Sent: 06 June 2014 10:54
> To: Andy Seaborne; public-csv-wg@w3.org
> Subject: Re: What to do when "primary key" cell values are blank
> 
> A blank/empty cell could also be interpreted as a value of "" rather
> than NULL, but that's probably asking for bother.
> 
> What I have encountered in Excel reports from various business systems
> is for an ID to be only listed on the first row eg.
> 
> 123    foo    1999
>         bar    2000
>         baz    2001
> 124    xyzzy  1975
>         plugh  1669
> 
> etc.
> 
> For the toolchain we built for ourselves, we had the option set a flag
> on a column that indicated that nulls should be interpreted as "ditto".
> 
> 
> 
> 
> On 06/06/2014 10:23, Andy Seaborne wrote:
> > On 06/06/14 09:53, Tandy, Jeremy wrote:
> >> Hi - when putting together Use Case #24 - Expressing a hierarchy
> >> within occupational listings [1] I was considering how primary key
> >> behaviour might work. In this use case, there are four different
> >> types of entity described in a single CSV file. I inferred that we
> >> might apply four different templates to pull out the relevant
> >> contents and transform into RDF. A given row describes _one of_ the
> >> types of entity, meaning that the primary key column asserted, say,
> >> for extracting "SOC Major Group" concepts will often be blank.
> >>
> >> I have stated in the use case that:
> >>> Where the value in the designated primary key column is blank, the
> >>> row is ignored.
> >>
> >> I have also added this constraint to the primary key requirement
> [2].
> >>
> >> Please advise is this is inappropriate!
> >
> > We use template conversion - we often run multiple templates on the
> > same CSV, essentially extracting different kinds of entity on each
> > pass. "The" primary key is different in each pass.  The note in
> > R-PrimaryKey does not meet our experiences.
> >
> > JeniT's condition extract is an example where it might be done as a
> > pass to generate the skos:broader separately from the "code
> rdfs:label
> > ....".
> >
> > "Primary" is being overloaded between uniquely identifying a row
> > (structural to CSV files), and uniquely identifying an entity
> > (modelling).  In denormalised data, entities might get repeated on
> > different rows.
> >
> >     Andy
> >
> >>
> >> Regards, Jeremy
> >>
> >>
> >> [1]
> >> http://w3c.github.io/csvw/use-cases-and-requirements/#UC-
> ExpressingHierarchyWithinOccupationalListings
> >> [2] http://w3c.github.io/csvw/use-cases-and-requirements/#R-
> PrimaryKey
> >>
> >
> >
> 
> --
> Christopher Gutteridge -- http://users.ecs.soton.ac.uk/cjg
> 
> University of Southampton Open Data Service:
> http://data.southampton.ac.uk/
> You should read the ECS Web Team blog:
> http://blogs.ecs.soton.ac.uk/webteam/
>

Received on Friday, 6 June 2014 10:44:23 UTC