Re: CSV+ Direct Mapping candidate? from David Booth on 2014-03-03 (public-csv-wg@w3.org from March 2014)

From: David Booth <david@dbooth.org>
Date: Mon, 03 Mar 2014 12:54:15 -0500
To: Richard Cyganiak <richard@cyganiak.de>
CC: Niklas Lindström <lindstream@gmail.com>, Gregg Kellogg <gregg@greggkellogg.net>, "public-csv-wg@w3.org" <public-csv-wg@w3.org>
Message-ID: <5314C1C7.9000208@dbooth.org>

On 03/01/2014 09:59 AM, Richard Cyganiak wrote:
> David,
>
> Let me first add one more clarification. I don't think of a Tarql
> mapping as a CSV-to-RDF mapping. I think of it as a
> logical-table-to-RDF mapping. Whether the table comes from CSV, TSV,
> SAS, SPSS or relational doesn't matter, as long as we define a
> sensible mapping from each of these syntaxes to a table of RDF terms
> with named columns. These mappings are generally easy to define,
> lossless, and don't add much arbitrary extra information.
>
> It's worth pointing out that such a table of RDF terms with named
> columns is the same logical structure as a SPARQL result, so it's
> something that is already supported to various degrees in most RDF
> toolkits.
>
> Now, you said:
>
>> The overall goal is to be able to predictably view all data
>> uniformly as RDF.
>
> I ask: why? What is the reason for wanting to do that?

Obviously the goal is for information integration -- to make use of 
RDF's value proposition.

> Especially,
> why would you dissolve a cleanly structured table into a soup of
> triples by inserting meaningless and arbitrary extra elements? What
> do you win by doing that?

Quite the opposite.  The goal is to expose the table's intended 
information as RDF -- no more and no less.  It is not to insert 
meaningless or arbitrary extra elements, nor to discard potentially 
meaningful information.

>
> I observe that if your toolkit supports SPARQL results and Tarql,
> then it can already display tables of RDF terms, query them, and
> transform them to arbitrary graphs according to a mapping. What else
> would you want to do with them that requires the
> direct-mapping-to-triple-soup?

Semantic transformations: transforming from one data model to another. 
This is almost always needed when integrating data.

>
> I think I'd rather like to see an RDF ecosystem with excellent
> support for tables as well as graphs, while you'd rather like to see
> an RDF ecosystem where everyone treats everything as a graph, even if
> it's not a graph.

I want to decouple syntactic lift from semantic transformations, so that 
all semantic transformations can be uniformly performed in RDF.  I would 
rather not have to deal with a different semantic transformation 
language for each data format.  I prefer to factor out the task of 
syntactic lift, from the task of semantic transformation, so that: (a) 
the same semantic transformation language and tools can be used 
regardless of the data's source format; and (b) model bias will not be 
introduced into RDF that is exposed.  I'll explain more what I mean in a 
separate reply to Gregg and Andy.

As you know, RDF can perfectly well represent tabular data, hierarchical 
data and any other form of data, so to my mind there need not be any 
conflict between providing excellent support for tabular data AND being 
able to uniformly operate at the RDF level -- provided that tabular data 
can be predictably viewed as RDF.  Again, I'll try to explain more what 
I mean in my reply to Gregg and Andy.

Thanks,
David Booth

Received on Monday, 3 March 2014 17:54:44 UTC