The syntax issue

Harry correctly urges us to press forward with turning the SQL-Based  
Approach into a FPWD.

There is one major obstacle though that needs to be tackled before  
work on a FPWD can be started: the question what syntax the language  
should use.

Ashok has stated that we should talk about syntax “later”, but this  
discussion has to happen before serious work on an official draft  
starts. Once a draft is out, the public will assume that the syntax  
used in the draft is the official and canonical syntax for the  
language, and it is key to send the right signal there.

Which means, the discussion has to happen now.


I can understand Souri's decision to base his initial work on XML. But  
I believe that XML is not the best choice of syntax for R2RML. I  
instead propose that R2RML mappings should themselves be RDF graphs,  
with Turtle or RDF/XML as the default syntax for writing R2RML files.

Here is why.


1. CHARTER REQUIREMENTS

The RDB2RDF charter states: “The mapping language SHOULD have a human- 
readable syntax as well as XML and RDF representations of the syntax  
for purposes of discovery and machine generation.” [1] Using RDF kills  
three birds with one stone: It ticks the RDF box, it ticks the “human- 
readable syntax” box (Turtle), and it ticks the XML box (RDF/XML).


2. PREFIX HANDLING

The language needs to refer to RDF vocabulary terms, which are  
identified by URIs, and are conventionally represented as QNames or  
CURIEs (that is, http://xmlns.com/foaf/0.1/Person is represented as  
foaf:Person). This means that the language needs features for  
establishing prefix mappings (associate "foaf" with "http://xmlns.com/foaf/0.1/ 
"). This is a source of pain in XML-based languages (cf. ongoing  
tensions over RDFa vs HTML5, and RIF's XML syntax). Using a language  
that has a built-in mechanism for establishing prefix mappings and  
expanding QNames/CURIEs would avoid this problem.


3. EXTENSIBILITY

RDF gives us various ways of annotating mappings (e.g., providing  
additional documentation, versioning-related annotations, cross-links  
to other software artifacts etc) for free. For example, I could attack  
rdfs:comment, dc:modified, dc:creator and similar properties to any  
part of a mapping. It also provides a clear syntactical framework for  
vendor-specific extensions.


4. COMMUNITY EXPECTATIONS

R2RML is a language for mapping databases to RDF. Hence, it bridges a  
world that speaks SQL to a world that speaks the RDF technology stack  
(RDF, SPARQL, RIF etc). Hence, arguments can be made for basing R2RML  
syntax on RDF (like in D2RQ or SquirrelRDF), or on SQL (like Virtuoso  
RDF Views), or on SPARQL (like Eric's approach), or on RIF. Basing  
R2RML on XML drags an unrelated third technology stack into the mix.


5. SUITABILITY OF XML FOR CONFIGURATION

XML is a good syntax for text markup (cf. XHTML, DocBook, TEI). It  
works ok for transmitting structured data (cf. SOAP, Atom) albeit  
facing increasing competition from JSON. But I think it is now evident  
that using XML for configuration files that are edited and read  
directly by users is not a good idea. The most obvious drawback in  
this context is that you have to type everything <twice>...</twice>!



To show what Souri's approach could look like if rendered in RDF  
(specifically Turtle), I took his example [2] and re-wrote it in  
Turtle [3] syntax. You can find it here:

http://www.w3.org/2001/sw/rdb2rdf/wiki/R2RML_in_Turtle

A raw version of just the file [4] and auto-generated graph view [5]  
are also available.

I propose to proceed based on the concepts of Souri's approach, but  
with an RDF serialization instead of XML as the surface syntax.

Opinions?

Best,
Richard


[1] http://www.w3.org/2009/08/rdb2rdf-charter.html
[2] http://www.w3.org/2001/sw/rdb2rdf/wiki/Example_of_SQL-based_RDB2RDF_Mapping:_Revision_1
[3] http://www.w3.org/TeamSubmission/turtle/
[4] http://github.com/cygri/r2rml/raw/master/examples/emp-dept.ttl
[5] http://bit.ly/asIik4

Received on Wednesday, 25 August 2010 11:42:49 UTC