Re: CSV+ Direct Mapping candidate?

David,

Let me first add one more clarification. I don't think of a Tarql mapping as a CSV-to-RDF mapping. I think of it as a logical-table-to-RDF mapping. Whether the table comes from CSV, TSV, SAS, SPSS or relational doesn't matter, as long as we define a sensible mapping from each of these syntaxes to a table of RDF terms with named columns. These mappings are generally easy to define, lossless, and don't add much arbitrary extra information.

It's worth pointing out that such a table of RDF terms with named columns is the same logical structure as a SPARQL result, so it's something that is already supported to various degrees in most RDF toolkits.

Now, you said:

> The overall goal is to be able to predictably view all data uniformly as RDF.

I ask: why? What is the reason for wanting to do that? Especially, why would you dissolve a cleanly structured table into a soup of triples by inserting meaningless and arbitrary extra elements? What do you win by doing that?

I observe that if your toolkit supports SPARQL results and Tarql, then it can already display tables of RDF terms, query them, and transform them to arbitrary graphs according to a mapping. What else would you want to do with them that requires the direct-mapping-to-triple-soup?

I think I'd rather like to see an RDF ecosystem with excellent support for tables as well as graphs, while you'd rather like to see an RDF ecosystem where everyone treats everything as a graph, even if it's not a graph.

Best,
Richard


> On 28 Feb 2014, at 21:48, David Booth <david@dbooth.org> wrote:
> 
>> On 02/28/2014 02:44 PM, Richard Cyganiak wrote:
>> [ . . . ]
>>> On 28 Feb 2014, at 17:10, David Booth <david@dbooth.org> wrote:
>>> [ . . . ]  what I meant was that tarql could be
>>> viewed as a shortcut over an implicit two step process: instead of
>>> going in two steps from A to B to C, tarql takes a shortcut from A
>>> to C, where A is the CSV, B is direct-mapped RDF, and C is the
>>> transformed RDF produced by the CONSTRUCT clause.
>> 
>> The interesting question—and one that I myself cannot answer with
>> absolute certainty—is whether there is any value in having B.
> 
> That's a great question.  I'm sure some will want to go directly from A to C -- and that's fine too -- but here's a brief explanation (in the form of a use case) for why I prefer to cleanly separate syntactic from semantic transformations:
> [[
> Title: CSV+ Direct Mapping
> 
> The overall goal is to be able to predictably view all data uniformly as RDF, even though it may be serialized in a wide variety of formats, including those that are not natively RDF serializations, such as CSV+.  Independent parties using different tools should see the same information when viewing CSV+ data as RDF.
> 
> A key reason for using a direct mapping -- whether for CSV+, relational data or anything else -- is to cleanly separate the *syntactic* issue of mapping non-native-RDF formats to RDF, from the *semantic* issue of model alignment, in which model transformations are routinely needed. By standardizing the direct mappings, RDF-to-RDF transformations can be more readily shared, because they can assume a predictable starting point based on the direct mapping.
> 
> For example, Alice, using the W3C-compliant AlphaWorks Uniform RDF Deserializer, reads a CSV+ spreadsheet from http://example/table.tsv . The AlphaWorks Deserializer, following W3C standards, finds no information about the spreadsheet beyond what is contained in table.tsv itself.  Thus, it uses the default **CSV+ Direct Mapping** to interpret the spreadsheet as RDF.
> 
> Bob, using the W3C-compliant BetaTech Universal RDF Slurper, reads the same CSV+ spreadsheet from http://example/table.tsv .  The BetaTech Slurper, following W3C standards, also finds no information about the spreadsheet beyond what is contained in table.tsv itself.  Thus, it too uses the default **CSV+ Direct Mapping** to interpret the spreadsheet as RDF.
> 
> Alice and Bob then talk by phone about the RDF content of the spreadsheet.  Because they know that their software has following the W3C standards, they are assured that they are talking about the *same* RDF graph (or at least isomorphic).  This makes them much happier than they were before standardization, because it allows them to share RDF-to-RDF transformations that can be applied to the resulting RDF graph.  Prior to standardization, they were unable to share RDF-to-RDF transformations, because their RDF deserializers produced different RDF from the same CSV+ data.
> ]]
> 
> In other words, rather than having a lots of special purpose languages (R2RML, one for CSV+, etc.) for transforming from lots of non-RDF formats into target RDF models, thus combining syntactic lifting to RDF *and* semantic alignment, I would rather have lots of simple, standard direct mappings that lift non-RDF formats to RDF, and then use one general purpose semantic transformation language for doing semantic alignment, such as SPARQL rules.   I think this gives a cleaner, more maintainable architecture, as it factors out all the semantic transformations.
> 
> I'm not against taking shortcuts sometimes -- sometimes they're expedient -- but I want to make sure we don't neglect the need for a direct mapping.
> 
> David

Received on Saturday, 1 March 2014 15:00:10 UTC