- From: Paul Tyson <phtyson@sbcglobal.net>
- Date: Wed, 26 Mar 2008 21:04:30 -0500
- To: public-xg-rdb2rdf@w3.org
Consider as a modest goal, creating an RDF vocabulary to describe the relational data model: tuples, aggregates and components thereof, and metadata (header). This would provide a simple standard for putting an SQL result set into SPARQLable form--or for any other semantic processing. It would be useful not just for SQL result sets, but also for any syntactic format that resembles tuples--csv or spreadsheet, for example. After you have a standard late-bound schema describing tuples and the rest, it would be helpful to standardize a mapping from the late-bound form to an early-bound form. This mapping would not add any semantic information, but it would make the instance datasets easier to process for some purposes. It is unfortunate that no standard XML schema for relational data was developed in the early days of XML. Although seemingly trivial, it would have eliminated a lot of db-to-xml mapping effort and allowed developers to focus on putting the data to use instead of worrying about what tags to wrap it in. And, it would allow more options for merging data from disparate databases. Once the data is in XML, you have all sorts of options and tools to add value; until it's in XML, you are limited. Just so with RDF: once the data is in an RDF graph, you have many ways of enriching it semantically, or making it available for semantic queries; so the first problem is getting it in RDF. And the quickest, easiest, most reliable way to do this is to use a late-bound vocabulary that mirrors the relational data model. To make another comparison with XML: the gap between XML and RDF is often mentioned, and the GRDDL standard is one attempt to fill this. But another option, apparently neglected, is to use the XML infoset RDF vocabulary. A trivial XSLT stylesheet will turn any XML document into RDF/XML which can be submitted to a SPARQL engine or any other RDF processor. The return trip could be made via the SPARQL XML results schema and a stock XSLT that requires no knowledge of the original XML schema. There is therefore no "gap": an XML document is simply an early-bound rendition of an infoset RDF graph. Likewise, an SQL table is an early-bound rendition of an rdb RDF graph. Admittedly, writing queries for late-bound instances--whether in XPath or SPARQL--can be tedious. Large applications would probably convert to an early-bound form somewhere along the way. A good architecture might use XProc to define a pipeline of incremental enrichments and transformations. Thanks for considering this suggestion. I look forward to watching the progress of the working group. --Paul Tyson
Received on Thursday, 27 March 2008 01:59:44 UTC