Re: Defining a SQL fragment? from Richard Cyganiak on 2010-07-22 (public-rdb2rdf-wg@w3.org from July 2010)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Thu, 22 Jul 2010 15:31:14 +0100
To: "Harry Halpin" <hhalpin@w3.org>
Cc: RDB2RDF WG <public-rdb2rdf-wg@w3.org>
Message-Id: <E33C7426-BA42-4DA0-9353-C242E3F7D6E8@cyganiak.de>
On 22 Jul 2010, at 13:57, Harry Halpin wrote:
>>> I think for ETL purposes language
>>> could have 4 parts. Each except 3) is optional.
>>>
>>> 1) Full vendor specific SQL to create a view
>>>
>>> 2) A portable subset of SQL to create a view
>>>
>>> 3) Mapping of that view to a default graph
>>>
>>> 4) Possibly running RDF-to-RDF transforms here (RIF).
>>
>> Where in these 4 do I say that USER.NAME should be mapped to  
>> foaf:name
>> rather than mydb:USER.NAME?
>
> It seems there are some differences here in the group, but I think 3)
> would be the right place, i.e. the mapping from SQL to the graph,  
> which
> seems often to come after creating some kind of view in SQL  - as done
> with full SQL power (1) or Datalog in (2). EricP seems to want to  
> use RIF
> to modify that (4).
>
> That's why I'm tempted to say, let's work on just 3) and assume  
> they'll be
> a place for 1) and then optionally leave 3) and 4) behind for now.
>
> What do you think?

Doesn't make sense to me.

The "default mapping", be it of a base table or a view or a SQL query  
result, *per definition* allows no further customization. So the  
property names in a default mapping will be whatever your column names  
were, and hence there will be no foaf:Person or foaf:name in the  
resulting RDF.

Note Souri's "SQL-based approach", which consists of 1) plus a  
language for customizing the "glue" that turns SQL result records into  
RDF triples. I read your list as rejecting that approach?

> [T]he idea was that there might be a simple subset of SQL we can  
> guarantee to be portable. Ashok has
> brought up another well-known vendor defined set of SQL.

Subsetting is not a useful way of achieving portability between SQL  
implementations.

There is no portable subset of SQL that includes concatenation, for  
example. If you want to generate URIs, you probably need concatenation.

We could state that R2ML expects SQL queries in standard (non-dialect)  
SQL, and provide a way for mapping authors to flag that they use a  
different dialect in their mapping file.

> However, that does not mean we should restrict people to use that  
> subset.
> For some people, portability may not be a concern. I'm OK with using
> anything to transform relational data to the graph as long as  
> implementers
> actually will implement it and users will use it (this does bring up
> concerns about any non SQL-based approach), as long as we can  
> guarantee at
> least subset of it's portability and then if something may not be  
> portable
> allow it to be clearly defined as such.

Sounds reasonable to me.

How about adding a "flavour" attribute to the block of SQL, so that  
mapping authors can announce what dialect they are using. A tool can  
check wether it understands that dialect. Optionally, authors could  
even put multiple flavours of the same query side-by-side to make  
their mapping files truly portable; implementations could check the  
available flavours and use the one they understand.

--- strawman syntax ---
<sql flavour="SQL92">SELECT 'a' || 'b'</sql>  (this is standard SQL)
<sql flavour="MySQL">SELECT CONCAT('a', 'b')</sql> (this is MySQL)
--- end strawman syntax ---

This doesn't guarantee portability (we can't), but it allows mapping  
authors to flag wether they use the standard flavour of SQL or a  
vendor-specific dialect.

(In reality, we should use URIs to identify the flavours so that  
vendors can define their own. And there should be a default -- perhaps  
SQL Core 08 as per Ashok's proposal.)

Best,
Richard


>
>>
>>> What this does not bring up is what eric and soeren were really
>>> wanting to
>>> do earlier as well, which was SPARQL->SQL mappings.
>>
>> Are you saying that we need separate languages for ETL access and for
>> SPARQL access to the mapped database? I don't think so; it's the same
>> language. R2ML should specify how to derive an RDF graph from a
>> relational DB. How to access that RDF graph (linked data, SPARQL,  
>> ETL,
>> brainwave transmission) is up to implementations.
>
> I would hope we do not need a separate language for that, but there  
> needs
> to be a clear statement about that in the spec.
>
>>
>> Best,
>> Richard
>>
>>
>>
>>
>>>
>>> However, before descending into the black hole of semantics and
>>> options,
>>> Im'm happy to agree to get a rough-draft out on 1) and 3) if people
>>> can't
>>> agreee on 2) and 4).
>>>
>>>>
>>>> I think there is a clear desire to allow full SQL in a compliant
>>>> implementation of the SQL-based approach. This is at least what I
>>>> gather from Souri's and Orri's comments. I can not remember anyone
>>>> making an argument that only a restricted SQL fragment should be
>>>> allowed in the SQL-based approach.
>>>>
>>>> Can you please explain, or point me to the discussion that  
>>>> motivates
>>>> the need for restrictions in the allowable SQL in the SQL-based
>>>> approach?
>>>>
>>>> Best,
>>>> Richard
>>>>
>>>>
>>>
>>
>>
>
>
Received on Thursday, 22 July 2010 14:31:50 UTC