Re: A draft outline for the CSV2RDF document

On 22/05/2014 12:53, Andy Seaborne wrote:
> On 22/05/14 12:13, Christopher Gutteridge wrote:
>> There's also the issue of a repeated heading. I've encountered that. eg.
>>
>> ID, Title, Contact 1, Email, Contact 2, Email
>
> And indeed no headings and RTL with repeated headings.
>
> Do you think more needs to be said in, say,
>
> http://w3c.github.io/csvw/syntax/index.html#core-tabular-data-model
> or
> http://w3c.github.io/csvw/syntax/index.html#headers
>
> ?
>
> (Looking at that, I was expecting #core-tabular-data-model to say that 
> "columns MAY have titles")
>
The idea of specifying headers or not in the mimetype is all very well, 
but that rather assumes we have a mimetype and that our data started as 
CSV (most of mine starts live as Excel or Sharepoint exports.)

I think that repeated headers is an edge case but it would be helpful to 
define a default behaviour.
- One option would be to fall back to treating it as an unheaded column.
- Another would be to append -2 -3 etc. based on repeats. Could cause an 
issue if someone maliciously made headings: X, X, X-2 as then you'd end 
up with X,X-2,X-2 and still have a clash.
- Another would be to append the column number to the repeated heading.
- To ignore data from that column entirely.
- To throw an error and refuse to process it.
This should also include a way to address empty headings, eg. if there 
was a heading row, but column 7 didn't have a heading but did contain data.



-- 
Christopher Gutteridge -- http://users.ecs.soton.ac.uk/cjg

University of Southampton Open Data Service: http://data.southampton.ac.uk/
You should read the ECS Web Team blog: http://blogs.ecs.soton.ac.uk/webteam/

Received on Thursday, 22 May 2014 13:34:14 UTC