Re: Defining a SQL fragment? from Marcelo Arenas on 2010-07-22 (public-rdb2rdf-wg@w3.org from July 2010)

From: Marcelo Arenas <marcelo.arenas1@gmail.com>
Date: Thu, 22 Jul 2010 11:30:49 -0400
To: Harry Halpin <hhalpin@w3.org>
Cc: Richard Cyganiak <richard@cyganiak.de>, RDB2RDF WG <public-rdb2rdf-wg@w3.org>
Message-ID: <AANLkTimQDZhHEkrLJ-Z1H7vqyKznF12ZEdVRtG-iAntl@mail.gmail.com>

On Thu, Jul 22, 2010 at 10:45 AM, Harry Halpin <hhalpin@w3.org> wrote:
>>
>> On 22 Jul 2010, at 13:57, Harry Halpin wrote:
>>>>> I think for ETL purposes language
>>>>> could have 4 parts. Each except 3) is optional.
>>>>>
>>>>> 1) Full vendor specific SQL to create a view
>>>>>
>>>>> 2) A portable subset of SQL to create a view
>>>>>
>>>>> 3) Mapping of that view to a default graph
>>>>>
>>>>> 4) Possibly running RDF-to-RDF transforms here (RIF).
>>>>
>>>> Where in these 4 do I say that USER.NAME should be mapped to
>>>> foaf:name
>>>> rather than mydb:USER.NAME?
>>>
>>> It seems there are some differences here in the group, but I think 3)
>>> would be the right place, i.e. the mapping from SQL to the graph,
>>> which
>>> seems often to come after creating some kind of view in SQL  - as done
>>> with full SQL power (1) or Datalog in (2). EricP seems to want to
>>> use RIF
>>> to modify that (4).
>>>
>>> That's why I'm tempted to say, let's work on just 3) and assume
>>> they'll be
>>> a place for 1) and then optionally leave 3) and 4) behind for now.
>>>
>>> What do you think?
>>
>> Doesn't make sense to me.
>>
>> The "default mapping", be it of a base table or a view or a SQL query
>> result, *per definition* allows no further customization. So the
>> property names in a default mapping will be whatever your column names
>> were, and hence there will be no foaf:Person or foaf:name in the
>> resulting RDF.
>>
>> Note Souri's "SQL-based approach", which consists of 1) plus a
>> language for customizing the "glue" that turns SQL result records into
>> RDF triples. I read your list as rejecting that approach?
>
> No, of course not. Sorry, that "glue" is what I meant by "mapping" to the
> default graph, i.e. not the more restricted definition you gave above
> where URIs could not be generated (although we should have that as a
> restricted version of (3)).
>
> In fact, for the next meeting I was hoping to get Souri to work through
> his approach in detail for the WG, since I think it's one of the most
> pragmatic approaches. Souri, would you be up for it?
>
> What I'm trying to gather via that list is if we could see if there's an
> emerging consensus from the Working Group. I'm starting to feel it's about
> time for a poll...
>
>>
>>> [T]he idea was that there might be a simple subset of SQL we can
>>> guarantee to be portable. Ashok has
>>> brought up another well-known vendor defined set of SQL.
>>
>> Subsetting is not a useful way of achieving portability between SQL
>> implementations.
>>
>> There is no portable subset of SQL that includes concatenation, for
>> example. If you want to generate URIs, you probably need concatenation.
>>
>
> I think that was  EricP's the argument for a subset plus principled
> extensibility ala RIF, as RIF does string concat. But I understand
> concerns that people may not want to use RIF, or a Datalog-version of SQL
> that restricts them. Any response from Marcelo or Juan here?

Please notice that the Datalog fragment in the document:

http://www.w3.org/2001/sw/rdb2rdf/wiki/Database-Instance-Only_and_Database-Instances-and-Schema_Mapping

has the same expressive power as relational algebra and SPARQL 1.0.
This language can also be extended with aggregates, so it could also
be used to give semantics to more expressive mapping languages.

Received on Thursday, 22 July 2010 15:31:17 UTC