- From: Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk>
- Date: Fri, 6 Jun 2014 10:43:51 +0000
- To: Christopher Gutteridge <cjg@ecs.soton.ac.uk>, Andy Seaborne <andy@apache.org>, "public-csv-wg@w3.org" <public-csv-wg@w3.org>
Hi - I've seen this "blank" behaviour too. In fact, you can see it in the Standard Occupational Classification (SOC) referenced within 2.24 Use Case #24 - Expressing a hierarchy within occupational listings [1]. I concluded that "ditto" wasn't always a safe interpretation in the case where blanks occur in multiple columns. It's probably best for people to build a "custom" workflow, like yourselves, to "fill in the blanks" rather than us trying to generalize how this might work. Jeremy [1] http://w3c.github.io/csvw/use-cases-and-requirements/#UC-ExpressingHierarchyWithinOccupationalListings > -----Original Message----- > From: Christopher Gutteridge [mailto:cjg@ecs.soton.ac.uk] > Sent: 06 June 2014 10:54 > To: Andy Seaborne; public-csv-wg@w3.org > Subject: Re: What to do when "primary key" cell values are blank > > A blank/empty cell could also be interpreted as a value of "" rather > than NULL, but that's probably asking for bother. > > What I have encountered in Excel reports from various business systems > is for an ID to be only listed on the first row eg. > > 123 foo 1999 > bar 2000 > baz 2001 > 124 xyzzy 1975 > plugh 1669 > > etc. > > For the toolchain we built for ourselves, we had the option set a flag > on a column that indicated that nulls should be interpreted as "ditto". > > > > > On 06/06/2014 10:23, Andy Seaborne wrote: > > On 06/06/14 09:53, Tandy, Jeremy wrote: > >> Hi - when putting together Use Case #24 - Expressing a hierarchy > >> within occupational listings [1] I was considering how primary key > >> behaviour might work. In this use case, there are four different > >> types of entity described in a single CSV file. I inferred that we > >> might apply four different templates to pull out the relevant > >> contents and transform into RDF. A given row describes _one of_ the > >> types of entity, meaning that the primary key column asserted, say, > >> for extracting "SOC Major Group" concepts will often be blank. > >> > >> I have stated in the use case that: > >>> Where the value in the designated primary key column is blank, the > >>> row is ignored. > >> > >> I have also added this constraint to the primary key requirement > [2]. > >> > >> Please advise is this is inappropriate! > > > > We use template conversion - we often run multiple templates on the > > same CSV, essentially extracting different kinds of entity on each > > pass. "The" primary key is different in each pass. The note in > > R-PrimaryKey does not meet our experiences. > > > > JeniT's condition extract is an example where it might be done as a > > pass to generate the skos:broader separately from the "code > rdfs:label > > ....". > > > > "Primary" is being overloaded between uniquely identifying a row > > (structural to CSV files), and uniquely identifying an entity > > (modelling). In denormalised data, entities might get repeated on > > different rows. > > > > Andy > > > >> > >> Regards, Jeremy > >> > >> > >> [1] > >> http://w3c.github.io/csvw/use-cases-and-requirements/#UC- > ExpressingHierarchyWithinOccupationalListings > >> [2] http://w3c.github.io/csvw/use-cases-and-requirements/#R- > PrimaryKey > >> > > > > > > -- > Christopher Gutteridge -- http://users.ecs.soton.ac.uk/cjg > > University of Southampton Open Data Service: > http://data.southampton.ac.uk/ > You should read the ECS Web Team blog: > http://blogs.ecs.soton.ac.uk/webteam/ >
Received on Friday, 6 June 2014 10:44:23 UTC