Re: CSV+ Direct Mapping candidate?

On 02/28/2014 02:44 PM, Richard Cyganiak wrote:
> [ . . . ]
>> On 28 Feb 2014, at 17:10, David Booth <david@dbooth.org> wrote:
>> [ . . . ]  what I meant was that tarql could be
>> viewed as a shortcut over an implicit two step process: instead of
>> going in two steps from A to B to C, tarql takes a shortcut from A
>> to C, where A is the CSV, B is direct-mapped RDF, and C is the
>> transformed RDF produced by the CONSTRUCT clause.
>
> The interesting question—and one that I myself cannot answer with
> absolute certainty—is whether there is any value in having B.

That's a great question.  I'm sure some will want to go directly from A 
to C -- and that's fine too -- but here's a brief explanation (in the 
form of a use case) for why I prefer to cleanly separate syntactic from 
semantic transformations:
[[
Title: CSV+ Direct Mapping

The overall goal is to be able to predictably view all data uniformly as 
RDF, even though it may be serialized in a wide variety of formats, 
including those that are not natively RDF serializations, such as CSV+. 
  Independent parties using different tools should see the same 
information when viewing CSV+ data as RDF.

A key reason for using a direct mapping -- whether for CSV+, relational 
data or anything else -- is to cleanly separate the *syntactic* issue of 
mapping non-native-RDF formats to RDF, from the *semantic* issue of 
model alignment, in which model transformations are routinely needed. By 
standardizing the direct mappings, RDF-to-RDF transformations can be 
more readily shared, because they can assume a predictable starting 
point based on the direct mapping.

For example, Alice, using the W3C-compliant AlphaWorks Uniform RDF 
Deserializer, reads a CSV+ spreadsheet from http://example/table.tsv . 
The AlphaWorks Deserializer, following W3C standards, finds no 
information about the spreadsheet beyond what is contained in table.tsv 
itself.  Thus, it uses the default **CSV+ Direct Mapping** to interpret 
the spreadsheet as RDF.

Bob, using the W3C-compliant BetaTech Universal RDF Slurper, reads the 
same CSV+ spreadsheet from http://example/table.tsv .  The BetaTech 
Slurper, following W3C standards, also finds no information about the 
spreadsheet beyond what is contained in table.tsv itself.  Thus, it too 
uses the default **CSV+ Direct Mapping** to interpret the spreadsheet as 
RDF.

Alice and Bob then talk by phone about the RDF content of the 
spreadsheet.  Because they know that their software has following the 
W3C standards, they are assured that they are talking about the *same* 
RDF graph (or at least isomorphic).  This makes them much happier than 
they were before standardization, because it allows them to share 
RDF-to-RDF transformations that can be applied to the resulting RDF 
graph.  Prior to standardization, they were unable to share RDF-to-RDF 
transformations, because their RDF deserializers produced different RDF 
from the same CSV+ data.
]]

In other words, rather than having a lots of special purpose languages 
(R2RML, one for CSV+, etc.) for transforming from lots of non-RDF 
formats into target RDF models, thus combining syntactic lifting to RDF 
*and* semantic alignment, I would rather have lots of simple, standard 
direct mappings that lift non-RDF formats to RDF, and then use one 
general purpose semantic transformation language for doing semantic 
alignment, such as SPARQL rules.   I think this gives a cleaner, more 
maintainable architecture, as it factors out all the semantic 
transformations.

I'm not against taking shortcuts sometimes -- sometimes they're 
expedient -- but I want to make sure we don't neglect the need for a 
direct mapping.

David

Received on Friday, 28 February 2014 21:48:49 UTC