- From: David Booth <david@dbooth.org>
- Date: Fri, 28 Feb 2014 16:48:20 -0500
- To: Richard Cyganiak <richard@cyganiak.de>
- CC: Niklas Lindström <lindstream@gmail.com>, Gregg Kellogg <gregg@greggkellogg.net>, "public-csv-wg@w3.org" <public-csv-wg@w3.org>
On 02/28/2014 02:44 PM, Richard Cyganiak wrote: > [ . . . ] >> On 28 Feb 2014, at 17:10, David Booth <david@dbooth.org> wrote: >> [ . . . ] what I meant was that tarql could be >> viewed as a shortcut over an implicit two step process: instead of >> going in two steps from A to B to C, tarql takes a shortcut from A >> to C, where A is the CSV, B is direct-mapped RDF, and C is the >> transformed RDF produced by the CONSTRUCT clause. > > The interesting question—and one that I myself cannot answer with > absolute certainty—is whether there is any value in having B. That's a great question. I'm sure some will want to go directly from A to C -- and that's fine too -- but here's a brief explanation (in the form of a use case) for why I prefer to cleanly separate syntactic from semantic transformations: [[ Title: CSV+ Direct Mapping The overall goal is to be able to predictably view all data uniformly as RDF, even though it may be serialized in a wide variety of formats, including those that are not natively RDF serializations, such as CSV+. Independent parties using different tools should see the same information when viewing CSV+ data as RDF. A key reason for using a direct mapping -- whether for CSV+, relational data or anything else -- is to cleanly separate the *syntactic* issue of mapping non-native-RDF formats to RDF, from the *semantic* issue of model alignment, in which model transformations are routinely needed. By standardizing the direct mappings, RDF-to-RDF transformations can be more readily shared, because they can assume a predictable starting point based on the direct mapping. For example, Alice, using the W3C-compliant AlphaWorks Uniform RDF Deserializer, reads a CSV+ spreadsheet from http://example/table.tsv . The AlphaWorks Deserializer, following W3C standards, finds no information about the spreadsheet beyond what is contained in table.tsv itself. Thus, it uses the default **CSV+ Direct Mapping** to interpret the spreadsheet as RDF. Bob, using the W3C-compliant BetaTech Universal RDF Slurper, reads the same CSV+ spreadsheet from http://example/table.tsv . The BetaTech Slurper, following W3C standards, also finds no information about the spreadsheet beyond what is contained in table.tsv itself. Thus, it too uses the default **CSV+ Direct Mapping** to interpret the spreadsheet as RDF. Alice and Bob then talk by phone about the RDF content of the spreadsheet. Because they know that their software has following the W3C standards, they are assured that they are talking about the *same* RDF graph (or at least isomorphic). This makes them much happier than they were before standardization, because it allows them to share RDF-to-RDF transformations that can be applied to the resulting RDF graph. Prior to standardization, they were unable to share RDF-to-RDF transformations, because their RDF deserializers produced different RDF from the same CSV+ data. ]] In other words, rather than having a lots of special purpose languages (R2RML, one for CSV+, etc.) for transforming from lots of non-RDF formats into target RDF models, thus combining syntactic lifting to RDF *and* semantic alignment, I would rather have lots of simple, standard direct mappings that lift non-RDF formats to RDF, and then use one general purpose semantic transformation language for doing semantic alignment, such as SPARQL rules. I think this gives a cleaner, more maintainable architecture, as it factors out all the semantic transformations. I'm not against taking shortcuts sometimes -- sometimes they're expedient -- but I want to make sure we don't neglect the need for a direct mapping. David
Received on Friday, 28 February 2014 21:48:49 UTC