Re: A draft outline for the CSV2RDF document from Christopher Gutteridge on 2014-05-22 (public-csv-wg@w3.org from May 2014)

From: Christopher Gutteridge <cjg@ecs.soton.ac.uk>
Date: Thu, 22 May 2014 14:33:07 +0100
To: Andy Seaborne <andy@apache.org>, Gregg Kellogg <gregg@greggkellogg.net>
CC: Ivan Herman <ivan@w3.org>, W3C CSV on the Web Working Group <public-csv-wg@w3.org>
Message-ID: <EMEW3|d781bf5fa63bd199040625e8cfcf3431q4LEXE03cjg|ecs.soton.ac.uk|537DFC93.2060>

On 22/05/2014 12:53, Andy Seaborne wrote:
> On 22/05/14 12:13, Christopher Gutteridge wrote:
>> There's also the issue of a repeated heading. I've encountered that. eg.
>>
>> ID, Title, Contact 1, Email, Contact 2, Email
>
> And indeed no headings and RTL with repeated headings.
>
> Do you think more needs to be said in, say,
>
> http://w3c.github.io/csvw/syntax/index.html#core-tabular-data-model
> or
> http://w3c.github.io/csvw/syntax/index.html#headers
>
> ?
>
> (Looking at that, I was expecting #core-tabular-data-model to say that 
> "columns MAY have titles")
>
The idea of specifying headers or not in the mimetype is all very well, 
but that rather assumes we have a mimetype and that our data started as 
CSV (most of mine starts live as Excel or Sharepoint exports.)

I think that repeated headers is an edge case but it would be helpful to 
define a default behaviour.
- One option would be to fall back to treating it as an unheaded column.
- Another would be to append -2 -3 etc. based on repeats. Could cause an 
issue if someone maliciously made headings: X, X, X-2 as then you'd end 
up with X,X-2,X-2 and still have a clash.
- Another would be to append the column number to the repeated heading.
- To ignore data from that column entirely.
- To throw an error and refuse to process it.
This should also include a way to address empty headings, eg. if there 
was a heading row, but column 7 didn't have a heading but did contain data.



-- 
Christopher Gutteridge -- http://users.ecs.soton.ac.uk/cjg

University of Southampton Open Data Service: http://data.southampton.ac.uk/
You should read the ECS Web Team blog: http://blogs.ecs.soton.ac.uk/webteam/

Received on Thursday, 22 May 2014 13:34:14 UTC