Re: Defining a SQL fragment? from Harry Halpin on 2010-07-22 (public-rdb2rdf-wg@w3.org from July 2010)

From: Harry Halpin <hhalpin@w3.org>
Date: Thu, 22 Jul 2010 15:45:41 +0100 (BST)
To: "Richard Cyganiak" <richard@cyganiak.de>
Cc: "Harry Halpin" <hhalpin@w3.org>, "RDB2RDF WG" <public-rdb2rdf-wg@w3.org>
Message-ID: <958766afb8d335aaa925bc85c2b44d34.squirrel@webmail-mit.w3.org>
>
> On 22 Jul 2010, at 13:57, Harry Halpin wrote:
>>>> I think for ETL purposes language
>>>> could have 4 parts. Each except 3) is optional.
>>>>
>>>> 1) Full vendor specific SQL to create a view
>>>>
>>>> 2) A portable subset of SQL to create a view
>>>>
>>>> 3) Mapping of that view to a default graph
>>>>
>>>> 4) Possibly running RDF-to-RDF transforms here (RIF).
>>>
>>> Where in these 4 do I say that USER.NAME should be mapped to
>>> foaf:name
>>> rather than mydb:USER.NAME?
>>
>> It seems there are some differences here in the group, but I think 3)
>> would be the right place, i.e. the mapping from SQL to the graph,
>> which
>> seems often to come after creating some kind of view in SQL  - as done
>> with full SQL power (1) or Datalog in (2). EricP seems to want to
>> use RIF
>> to modify that (4).
>>
>> That's why I'm tempted to say, let's work on just 3) and assume
>> they'll be
>> a place for 1) and then optionally leave 3) and 4) behind for now.
>>
>> What do you think?
>
> Doesn't make sense to me.
>
> The "default mapping", be it of a base table or a view or a SQL query
> result, *per definition* allows no further customization. So the
> property names in a default mapping will be whatever your column names
> were, and hence there will be no foaf:Person or foaf:name in the
> resulting RDF.
>
> Note Souri's "SQL-based approach", which consists of 1) plus a
> language for customizing the "glue" that turns SQL result records into
> RDF triples. I read your list as rejecting that approach?

No, of course not. Sorry, that "glue" is what I meant by "mapping" to the
default graph, i.e. not the more restricted definition you gave above
where URIs could not be generated (although we should have that as a
restricted version of (3)).

In fact, for the next meeting I was hoping to get Souri to work through
his approach in detail for the WG, since I think it's one of the most
pragmatic approaches. Souri, would you be up for it?

What I'm trying to gather via that list is if we could see if there's an
emerging consensus from the Working Group. I'm starting to feel it's about
time for a poll...

>
>> [T]he idea was that there might be a simple subset of SQL we can
>> guarantee to be portable. Ashok has
>> brought up another well-known vendor defined set of SQL.
>
> Subsetting is not a useful way of achieving portability between SQL
> implementations.
>
> There is no portable subset of SQL that includes concatenation, for
> example. If you want to generate URIs, you probably need concatenation.
>

I think that was  EricP's the argument for a subset plus principled
extensibility ala RIF, as RIF does string concat. But I understand
concerns that people may not want to use RIF, or a Datalog-version of SQL
that restricts them. Any response from Marcelo or Juan here?

> We could state that R2ML expects SQL queries in standard (non-dialect)
> SQL, and provide a way for mapping authors to flag that they use a
> different dialect in their mapping file.
>
>> However, that does not mean we should restrict people to use that
>> subset.
>> For some people, portability may not be a concern. I'm OK with using
>> anything to transform relational data to the graph as long as
>> implementers
>> actually will implement it and users will use it (this does bring up
>> concerns about any non SQL-based approach), as long as we can
>> guarantee at
>> least subset of it's portability and then if something may not be
>> portable
>> allow it to be clearly defined as such.
>
> Sounds reasonable to me.
>
> How about adding a "flavour" attribute to the block of SQL, so that
> mapping authors can announce what dialect they are using. A tool can
> check wether it understands that dialect. Optionally, authors could
> even put multiple flavours of the same query side-by-side to make
> their mapping files truly portable; implementations could check the
> available flavours and use the one they understand.

That also sounds very reasonable.

>
> --- strawman syntax ---
> <sql flavour="SQL92">SELECT 'a' || 'b'</sql>  (this is standard SQL)
> <sql flavour="MySQL">SELECT CONCAT('a', 'b')</sql> (this is MySQL)
> --- end strawman syntax ---

I was hoping there might be a way for us to check to make sure that you
could have a "default" flavor that was portable and that covered the 80
part of the 80/20 amount of mappings. Perhaps this is not the case.

>
> This doesn't guarantee portability (we can't), but it allows mapping
> authors to flag wether they use the standard flavour of SQL or a
> vendor-specific dialect.
>
> (In reality, we should use URIs to identify the flavours so that
> vendors can define their own. And there should be a default -- perhaps
> SQL Core 08 as per Ashok's proposal.)
>
> Best,
> Richard
>
>
>>
>>>
>>>> What this does not bring up is what eric and soeren were really
>>>> wanting to
>>>> do earlier as well, which was SPARQL->SQL mappings.
>>>
>>> Are you saying that we need separate languages for ETL access and for
>>> SPARQL access to the mapped database? I don't think so; it's the same
>>> language. R2ML should specify how to derive an RDF graph from a
>>> relational DB. How to access that RDF graph (linked data, SPARQL,
>>> ETL,
>>> brainwave transmission) is up to implementations.
>>
>> I would hope we do not need a separate language for that, but there
>> needs
>> to be a clear statement about that in the spec.
>>
>>>
>>> Best,
>>> Richard
>>>
>>>
>>>
>>>
>>>>
>>>> However, before descending into the black hole of semantics and
>>>> options,
>>>> Im'm happy to agree to get a rough-draft out on 1) and 3) if people
>>>> can't
>>>> agreee on 2) and 4).
>>>>
>>>>>
>>>>> I think there is a clear desire to allow full SQL in a compliant
>>>>> implementation of the SQL-based approach. This is at least what I
>>>>> gather from Souri's and Orri's comments. I can not remember anyone
>>>>> making an argument that only a restricted SQL fragment should be
>>>>> allowed in the SQL-based approach.
>>>>>
>>>>> Can you please explain, or point me to the discussion that
>>>>> motivates
>>>>> the need for restrictions in the allowable SQL in the SQL-based
>>>>> approach?
>>>>>
>>>>> Best,
>>>>> Richard
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>>
>
>
>
Received on Thursday, 22 July 2010 14:45:43 UTC