Re: Defining a SQL fragment? from ashok malhotra on 2010-07-22 (public-rdb2rdf-wg@w3.org from July 2010)

From: ashok malhotra <ashok.malhotra@oracle.com>
Date: Thu, 22 Jul 2010 07:49:16 -0700
To: Harry Halpin <hhalpin@w3.org>
CC: Richard Cyganiak <richard@cyganiak.de>, RDB2RDF WG <public-rdb2rdf-wg@w3.org>
Message-ID: <4C485A6C.8060901@oracle.com>
Highlighting this para as it may be missed

> In fact, for the next meeting I was hoping to get Souri to work through
> his approach in detail for the WG, since I think it's one of the most
> pragmatic approaches. Souri, would you be up for it?

Next week we have a presentation from Eric Miller so I'm not sure we will have
the time.  But we should continue the discussion.

All the best, Ashok


Harry Halpin wrote:
>> On 22 Jul 2010, at 13:57, Harry Halpin wrote:
>>     
>>>>> I think for ETL purposes language
>>>>> could have 4 parts. Each except 3) is optional.
>>>>>
>>>>> 1) Full vendor specific SQL to create a view
>>>>>
>>>>> 2) A portable subset of SQL to create a view
>>>>>
>>>>> 3) Mapping of that view to a default graph
>>>>>
>>>>> 4) Possibly running RDF-to-RDF transforms here (RIF).
>>>>>           
>>>> Where in these 4 do I say that USER.NAME should be mapped to
>>>> foaf:name
>>>> rather than mydb:USER.NAME?
>>>>         
>>> It seems there are some differences here in the group, but I think 3)
>>> would be the right place, i.e. the mapping from SQL to the graph,
>>> which
>>> seems often to come after creating some kind of view in SQL  - as done
>>> with full SQL power (1) or Datalog in (2). EricP seems to want to
>>> use RIF
>>> to modify that (4).
>>>
>>> That's why I'm tempted to say, let's work on just 3) and assume
>>> they'll be
>>> a place for 1) and then optionally leave 3) and 4) behind for now.
>>>
>>> What do you think?
>>>       
>> Doesn't make sense to me.
>>
>> The "default mapping", be it of a base table or a view or a SQL query
>> result, *per definition* allows no further customization. So the
>> property names in a default mapping will be whatever your column names
>> were, and hence there will be no foaf:Person or foaf:name in the
>> resulting RDF.
>>
>> Note Souri's "SQL-based approach", which consists of 1) plus a
>> language for customizing the "glue" that turns SQL result records into
>> RDF triples. I read your list as rejecting that approach?
>>     
>
> No, of course not. Sorry, that "glue" is what I meant by "mapping" to the
> default graph, i.e. not the more restricted definition you gave above
> where URIs could not be generated (although we should have that as a
> restricted version of (3)).
>
> In fact, for the next meeting I was hoping to get Souri to work through
> his approach in detail for the WG, since I think it's one of the most
> pragmatic approaches. Souri, would you be up for it?
>
> What I'm trying to gather via that list is if we could see if there's an
> emerging consensus from the Working Group. I'm starting to feel it's about
> time for a poll...
>
>   
>>> [T]he idea was that there might be a simple subset of SQL we can
>>> guarantee to be portable. Ashok has
>>> brought up another well-known vendor defined set of SQL.
>>>       
>> Subsetting is not a useful way of achieving portability between SQL
>> implementations.
>>
>> There is no portable subset of SQL that includes concatenation, for
>> example. If you want to generate URIs, you probably need concatenation.
>>
>>     
>
> I think that was  EricP's the argument for a subset plus principled
> extensibility ala RIF, as RIF does string concat. But I understand
> concerns that people may not want to use RIF, or a Datalog-version of SQL
> that restricts them. Any response from Marcelo or Juan here?
>
>   
>> We could state that R2ML expects SQL queries in standard (non-dialect)
>> SQL, and provide a way for mapping authors to flag that they use a
>> different dialect in their mapping file.
>>
>>     
>>> However, that does not mean we should restrict people to use that
>>> subset.
>>> For some people, portability may not be a concern. I'm OK with using
>>> anything to transform relational data to the graph as long as
>>> implementers
>>> actually will implement it and users will use it (this does bring up
>>> concerns about any non SQL-based approach), as long as we can
>>> guarantee at
>>> least subset of it's portability and then if something may not be
>>> portable
>>> allow it to be clearly defined as such.
>>>       
>> Sounds reasonable to me.
>>
>> How about adding a "flavour" attribute to the block of SQL, so that
>> mapping authors can announce what dialect they are using. A tool can
>> check wether it understands that dialect. Optionally, authors could
>> even put multiple flavours of the same query side-by-side to make
>> their mapping files truly portable; implementations could check the
>> available flavours and use the one they understand.
>>     
>
> That also sounds very reasonable.
>
>   
>> --- strawman syntax ---
>> <sql flavour="SQL92">SELECT 'a' || 'b'</sql>  (this is standard SQL)
>> <sql flavour="MySQL">SELECT CONCAT('a', 'b')</sql> (this is MySQL)
>> --- end strawman syntax ---
>>     
>
> I was hoping there might be a way for us to check to make sure that you
> could have a "default" flavor that was portable and that covered the 80
> part of the 80/20 amount of mappings. Perhaps this is not the case.
>
>   
>> This doesn't guarantee portability (we can't), but it allows mapping
>> authors to flag wether they use the standard flavour of SQL or a
>> vendor-specific dialect.
>>
>> (In reality, we should use URIs to identify the flavours so that
>> vendors can define their own. And there should be a default -- perhaps
>> SQL Core 08 as per Ashok's proposal.)
>>
>> Best,
>> Richard
>>
>>
>>     
>>>>> What this does not bring up is what eric and soeren were really
>>>>> wanting to
>>>>> do earlier as well, which was SPARQL->SQL mappings.
>>>>>           
>>>> Are you saying that we need separate languages for ETL access and for
>>>> SPARQL access to the mapped database? I don't think so; it's the same
>>>> language. R2ML should specify how to derive an RDF graph from a
>>>> relational DB. How to access that RDF graph (linked data, SPARQL,
>>>> ETL,
>>>> brainwave transmission) is up to implementations.
>>>>         
>>> I would hope we do not need a separate language for that, but there
>>> needs
>>> to be a clear statement about that in the spec.
>>>
>>>       
>>>> Best,
>>>> Richard
>>>>
>>>>
>>>>
>>>>
>>>>         
>>>>> However, before descending into the black hole of semantics and
>>>>> options,
>>>>> Im'm happy to agree to get a rough-draft out on 1) and 3) if people
>>>>> can't
>>>>> agreee on 2) and 4).
>>>>>
>>>>>           
>>>>>> I think there is a clear desire to allow full SQL in a compliant
>>>>>> implementation of the SQL-based approach. This is at least what I
>>>>>> gather from Souri's and Orri's comments. I can not remember anyone
>>>>>> making an argument that only a restricted SQL fragment should be
>>>>>> allowed in the SQL-based approach.
>>>>>>
>>>>>> Can you please explain, or point me to the discussion that
>>>>>> motivates
>>>>>> the need for restrictions in the allowable SQL in the SQL-based
>>>>>> approach?
>>>>>>
>>>>>> Best,
>>>>>> Richard
>>>>>>
>>>>>>
>>>>>>             
>>>>         
>>>       
>>
>>     
>
>
>
Received on Thursday, 22 July 2010 14:51:50 UTC