rdb rdf vocabulary

Consider as a modest goal, creating an RDF vocabulary to describe the 
relational data model: tuples, aggregates and components thereof, and 
metadata (header).

This would provide a simple standard for putting an SQL result set into 
SPARQLable form--or for any other semantic processing.  It would be 
useful not just for SQL result sets, but also for any syntactic format 
that resembles tuples--csv or spreadsheet, for example.

After you have a standard late-bound schema describing tuples and the 
rest, it would be helpful to standardize a mapping from the late-bound 
form to an early-bound form.  This mapping would not add any semantic 
information, but it would make the instance datasets easier to process 
for some purposes.

It is unfortunate that no standard XML schema for relational data was 
developed in the early days of XML.  Although seemingly trivial, it 
would have eliminated a lot of db-to-xml mapping effort and allowed 
developers to focus on putting the data to use instead of worrying about 
what tags to wrap it in.  And, it would allow more options for merging 
data from disparate databases.  Once the data is in XML, you have all 
sorts of options and tools to add value; until it's in XML, you are limited.

Just so with RDF: once the data is in an RDF graph, you have many ways 
of enriching it semantically, or making it available for semantic 
queries; so the first problem is getting it in RDF.  And the quickest, 
easiest, most reliable way to do this is to use a late-bound vocabulary 
that mirrors the relational data model.

To make another comparison with XML: the gap between XML and RDF is 
often mentioned, and the GRDDL standard is one attempt to fill this. 
But another option, apparently neglected, is to use the XML infoset RDF 
vocabulary.  A trivial XSLT stylesheet will turn any XML document into 
RDF/XML which can be submitted to a SPARQL engine or any other RDF 
processor.  The return trip could be made via the SPARQL XML results 
schema and a stock XSLT that requires no knowledge of the original XML 
schema.  There is therefore no "gap": an XML document is simply an 
early-bound rendition of an infoset RDF graph.

Likewise, an SQL table is an early-bound rendition of an rdb RDF graph.

Admittedly, writing queries for late-bound instances--whether in XPath 
or SPARQL--can be tedious.  Large applications would probably convert to 
an early-bound form somewhere along the way.  A good architecture might 
use XProc to define a pipeline of incremental enrichments and 
transformations.

Thanks for considering this suggestion.  I look forward to watching the 
progress of the working group.

--Paul Tyson

Received on Thursday, 27 March 2008 01:59:44 UTC