Tableau tools [was: CSV2RDF and R2RML]

Hi James - sounds like we should add Tableau to the list of typical tools people use with CSV data <> :-)

-----Original Message-----
From: James McKinney [] 
Sent: 19 February 2014 16:09
To: Ivan Herman
Cc: Andy Seaborne;
Subject: Re: CSV2RDF and R2RML

> What this tells me, though, is that there is only that much we can do 
> on providing clean data. At this moment we are talking about the 
> conversion to JSON, RDF, or XML or whatever: in all cases there is a 
> level of cleanup that _will_ be in the realm of the data consumer, no 
> matter what. We should not try to cover all the pathological cases...
> To take the example above with
> Country,Population,2010,2011,2012,2013
> if the generated JSON is a simply copy of that, ie,
> {
> "2000" : "true",
> "2010" : "false",
> ...
> }
> one can easily produce a post-processing program that transforms this 
> data in a more proper way for that specific case, but I have 
> difficulties to imagine how we would define some sort of a generic 
> almost-turing-complete language to define transformations in 
> general... For this case even the @context of JSON-LD would not help.
> I guess what we may do is to analyze the use cases to see how frequent 
> the various pathological cases are, and we may then be able to add 
> metadata information signaling those. But we will not cover all.

I agree that covering all cases is out of scope :) I can see how pathological CSV might be converted to JSON or XML. Would the RDF then have a bunch of invented terms like ex:2000, ex:2010?

> As for the multiple tables with the same file: do you mean that the 
> data is such that its structure is not homogeneous, ie, that it is as 
> if several csv files, with different structures, were concatenated 
> together? Now *that* is really messy:-(
> Ivan
> B.t.w., the my original remark referred to the 'foreign key' issue; 
> ie, that we can forget about that RDB terms for CSV... I hope that 
> does hold although your remark about several tables within the same CSV files made me scared.

Re: multiple tables within a single CSV: it's not uncommon for an Excel user to start a table at cell (0,0) (perhaps containing the "raw" data they are dealing with), and to then start another table (maybe one that summarizes or categorizes the information in the first table) somewhere to the right at cell (20,0). That way, they just need to scroll over to switch between the two tables, instead of reaching down to Excel's tabs and having to refer to cells across sheets when building the second table.

In other words, the Excel sheet is used as a canvas, on which the user puts a bunch of tables (not necessarily starting in the first row).

In my experience, most individuals create, open, and work with CSV in spreadsheet programs like Excel (LibreOffice, etc. users exhibit the same behavior as described above). When those users then try to upload their data to Tableau, etc. to visualize it, they are frequently disappointed that Tableau, for example, did not understand that the header "2010" is a value for the variable "year" and not the name of a variable.


Received on Wednesday, 19 February 2014 17:06:34 UTC