Re: The syntax issue from Ivan Mikhailov on 2010-08-25 (public-rdb2rdf-wg@w3.org from August 2010)

From: Ivan Mikhailov <imikhailov@openlinksw.com>
Date: Wed, 25 Aug 2010 21:12:31 +0700
To: Richard Cyganiak <richard@cyganiak.de>
Cc: RDB2RDF WG <public-rdb2rdf-wg@w3.org>
Message-ID: <1282745551.13507.5453.camel@octo.iv.dev.null>

Richard,

> I can understand Souri's decision to base his initial work on XML. But  
> I believe that XML is not the best choice of syntax for R2RML. I  
> instead propose that R2RML mappings should themselves be RDF graphs,  
> with Turtle or RDF/XML as the default syntax for writing R2RML files.
>
> Here is why.
> 1. CHARTER REQUIREMENTS
> 2. PREFIX HANDLING
> 3. EXTENSIBILITY
> 4. COMMUNITY EXPECTATIONS
> 5. SUITABILITY OF XML FOR CONFIGURATION


I already use RDF for mapping metadata in Virtuoso.
So I vote for RDF, even if reasons were different (and thus continue the
list).

6. INCREMENTAL COMPOSING

I don't have to fill a whole RDF graph at once whereas I had to compose
an XML document as, well, one document. Yes I can split it into generic
entities but it is convenient only of these entities form some sequence.
I can append something to an XML document by adding an external entity
and a ref, but not adding subsections to existing entities. At the same
time, inserting a new subgraph into existing graph is no more than a
LOAD or some INSERT(s).

7. AUTOMATIC AUDIT, RECOVERY, DIFF+PATCH

With metadata stored as RDF, I can check the integrity by a set of
simple SPARQL queries. If something is screwed up I can cure the problem
by trivial delete of suspicious metadata. This is extremely important if
make independent RDB applications share same RDF storage and even map
their data to "shared" graphs of the storage. I can backup and restore
metadata by SPARUL, I can make diffs and apply patches, I can make
garbage collection --- and as long as all these administrative routines
are based on SPARQL they can be made reusable at least across sequence
of versions of an RDB2RDF product, if not across products of different
vendors.

8. CHEAP TESTING

With mapping described in RDF, the testing tool can be flexible and
query metadata about mappings to test and actual data produced by the
mapping in one SPARQL query, or at least do both sorts of operations in
one language --- in SPARQL. With XML it would require a weird mix of
XQUERY and SPARQL.



A typical Virtuoso installation with RDB2RDF mapping in use is 10-20
applications, 100-2000 RDB2RDF mapping rules each, each application is
upgraded 3-4 times per year, upgrades are independent from each other.
It means one new mapping rule per average working hour. Then mix the
mapped relational data with "native" RDF quads. Add security
restrictions. The configuration of this nightmare is not a stable
document for reading from beginning to end because it will be changed
before the end is reached. Metadata about a knowledge base is a
knowledge base by itself, so RDF is a natural choice.

Best Regards,

Ivan Mikhailov
OpenLink Software
http://virtuoso.openlinksw.com

Received on Wednesday, 25 August 2010 15:11:22 UTC