Re: Defining a SQL fragment? from Harry Halpin on 2010-07-22 (public-rdb2rdf-wg@w3.org from July 2010)

From: Harry Halpin <hhalpin@w3.org>
Date: Thu, 22 Jul 2010 13:57:11 +0100 (BST)
To: "Richard Cyganiak" <richard@cyganiak.de>
Cc: "Harry Halpin" <hhalpin@w3.org>, "Marcelo Arenas" <marcelo.arenas1@gmail.com>, public-rdb2rdf-wg@w3.org
Message-ID: <7ec4c6e825fba168670c09efcaf17a51.squirrel@webmail-mit.w3.org>

> Harry,
>
> Thanks for the clarification.
>
> On 21 Jul 2010, at 17:42, Harry Halpin wrote:
>> I think for ETL purposes language
>> could have 4 parts. Each except 3) is optional.
>>
>> 1) Full vendor specific SQL to create a view
>>
>> 2) A portable subset of SQL to create a view
>>
>> 3) Mapping of that view to a default graph
>>
>> 4) Possibly running RDF-to-RDF transforms here (RIF).
>
> Where in these 4 do I say that USER.NAME should be mapped to foaf:name
> rather than mydb:USER.NAME?

It seems there are some differences here in the group, but I think 3)
would be the right place, i.e. the mapping from SQL to the graph, which
seems often to come after creating some kind of view in SQL  - as done
with full SQL power (1) or Datalog in (2). EricP seems to want to use RIF
to modify that (4).

That's why I'm tempted to say, let's work on just 3) and assume they'll be
a place for 1) and then optionally leave 3) and 4) behind for now.

What do you think?

>
>> I think Marcelo and Juan were wondering if steps 2-4 had a common core
>> that could be thought of semantically as Datalog.
>>
>> But if people choose 1) then they just have to know that R2ML will not
>> guarantee portability.
>
> Ok, but the differences between SQL dialects are mostly about syntax
> and hardly about semantics; so I'm still unsure how Datalog helps with
> SQL portability.

That is an issue though - I mean, if the standard just has a bunch of
vendor-specific SQL between curly brackets, then we may not be portable.

I'll let Marcelo and Juan argue for Datalog, but the idea was that there
might be a simple subset of SQL we can guarantee to be portable. Ashok has
brought up another well-known vendor defined set of SQL.

However, that does not mean we should restrict people to use that subset.
For some people, portability may not be a concern. I'm OK with using
anything to transform relational data to the graph as long as implementers
actually will implement it and users will use it (this does bring up
concerns about any non SQL-based approach), as long as we can guarantee at
least subset of it's portability and then if something may not be portable
allow it to be clearly defined as such.

>
>> What this does not bring up is what eric and soeren were really
>> wanting to
>> do earlier as well, which was SPARQL->SQL mappings.
>
> Are you saying that we need separate languages for ETL access and for
> SPARQL access to the mapped database? I don't think so; it's the same
> language. R2ML should specify how to derive an RDF graph from a
> relational DB. How to access that RDF graph (linked data, SPARQL, ETL,
> brainwave transmission) is up to implementations.

I would hope we do not need a separate language for that, but there needs
to be a clear statement about that in the spec.

>
> Best,
> Richard
>
>
>
>
>>
>> However, before descending into the black hole of semantics and
>> options,
>> Im'm happy to agree to get a rough-draft out on 1) and 3) if people
>> can't
>> agreee on 2) and 4).
>>
>>>
>>> I think there is a clear desire to allow full SQL in a compliant
>>> implementation of the SQL-based approach. This is at least what I
>>> gather from Souri's and Orri's comments. I can not remember anyone
>>> making an argument that only a restricted SQL fragment should be
>>> allowed in the SQL-based approach.
>>>
>>> Can you please explain, or point me to the discussion that motivates
>>> the need for restrictions in the allowable SQL in the SQL-based
>>> approach?
>>>
>>> Best,
>>> Richard
>>>
>>>
>>
>
>

Received on Thursday, 22 July 2010 12:57:18 UTC