Re: Strawman for a custom R2RML syntax from Richard Cyganiak on 2010-09-07 (public-rdb2rdf-wg@w3.org from September 2010)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Tue, 7 Sep 2010 07:34:42 +0100
To: Ivan Mikhailov <imikhailov@openlinksw.com>
Cc: RDB2RDF WG <public-rdb2rdf-wg@w3.org>
Message-Id: <2DEB17DD-11D8-41EA-ADCE-712D8EF3DDF6@cyganiak.de>
Hi Ivan,

Thanks for these insightful comments and for sharing your experience.  
You mention six features that would need to be added to this draft to  
make it work on par with RDF Views. I don't agree for all of them, but  
you are definitely right about some. So let's see ...

On 30 Aug 2010, at 09:59, Ivan Mikhailov wrote:
>> The strawman is here:
>>
>> http://www.w3.org/2001/sw/rdb2rdf/wiki/R2RML_in_a_custom_syntax
>
> Now let's extend the draft with options.
>
> Let us assign an (optional) name for every triple pattern of these
> templates, for diagnostic purposes.

I'm not convinced that this is really necessary. For diagnostic  
purposes, seeing a string representation of the triple pattern is good  
enough, such as this:

     ?emp biz:fullname ?name .

Is that much gained by assigning a name to the pattern so that  
diagnostic output can show this

     :pattern_emp_fullname

instead?

> Let's cut the SQL select in parts and combine them automatically,
> otherwise it is impossible to compose an adequate SQL join for a given
> basic graph pattern.

That's what D2RQ does -- basically the mapping author has to cut the  
SQL into bits and pieces, and state explicitly which of them are  
conditions, joins, expressions for property values and so on.

It took me a while, but since joining this WG I've come around to  
believing that this is not necessary.

If the SQL query is simple, then the RDB2RDF engine can parse the SQL  
query, and cut it into the required parts automatically. So you get  
your optimized joins.

If the SQL query is complex, then the RDB2RDF engine has two options.

First, it could treat the SQL query as a black box and use subselects  
for execution. This will be slow (or not, depending on the optimizer  
-- Souri and Juan have repeatedly asserted that this works just fine  
on Oracle and SQL Server). But at least it gives correct results, and  
the user can always try to write a simpler query.

Second, it could reject the SQL query as too complex, and just tell  
the user: "Sorry, I don't support aggregates in the SQL view  
definition." This would be an incomplete implementation of the  
standard, but today every RDB2RDF implementation has certain  
limitations in the expressivity of its mappings, so this wouldn't make  
matters worse.

Finally, if the engine can modify the DB, then it can just define a  
physical view in the DB.

So I don't think that the mapping language really needs to force the  
mapping author to decompose the SQL query into small parts. The engine  
can do it.

> Let us enrich IRITEMPLATEs by adding options absolutely needed for the
> optimizer (and by making them based on functions when needed). Let's
> specify the order of patterns and let some "exceptions" take  
> priority on
> "common cases", to cut useless unions.

Ok, this one is really interesting, can you give some details here?  
Pointers to Virtuoso documentation are fine.

> Let's manipulate them (add/remove/reorder) and let's do that in parts,
> because many independent applications may share one RDF storage and  
> they
> can be removed as well as installed.

Where do you see the obstacle in doing that with a text-based format?

The draft also supports giving IDs to view maps. So an engine could  
have features to enable/disable individual view maps by ID.

> We can also start from other end --- given the current syntax of RDF
> Views, try to remove features to make it simple. Unfortunately, for
> every given feature there will be a sample data mapping and a query  
> that
> will go slower at least order of magnitude without the excluded  
> feature.

Two of the four features above have no performance impact but are only  
about management of complex mappings, and a third can be handled  
without performance impact, so I think this is an overstatement.

> P.S. I forgot to mention two more extensions, for free text

Yeah that's a hard one. Can you say something about the way free text  
is handled in RDF Views? (No free text in D2RQ, so I have no  
experience here.)

> and  for LITERALTEMPLATES :(

Well, literal transformations can be done in SQL code in the SELECT  
clause, so the expressivity is already there. The reason for URI  
templates is really that they can be re-used many times throughout a  
mapping file. I think literal templates are not re-used as often as  
URI templates, and they don't benefit as much from the syntactic sugar  
that URI templates can offer (URIs can only contain certain  
characters, so we can add template stuff without too much pain from  
character escaping).

So I'm not convinced that it's worth having literal templates. But  
adding them would be easy enough without making the draft more  
complicated.

Best,
Richard



>
>
Received on Tuesday, 7 September 2010 06:35:18 UTC