- From: ashok malhotra <ashok.malhotra@oracle.com>
- Date: Thu, 22 Jul 2010 07:49:16 -0700
- To: Harry Halpin <hhalpin@w3.org>
- CC: Richard Cyganiak <richard@cyganiak.de>, RDB2RDF WG <public-rdb2rdf-wg@w3.org>
Highlighting this para as it may be missed > In fact, for the next meeting I was hoping to get Souri to work through > his approach in detail for the WG, since I think it's one of the most > pragmatic approaches. Souri, would you be up for it? Next week we have a presentation from Eric Miller so I'm not sure we will have the time. But we should continue the discussion. All the best, Ashok Harry Halpin wrote: >> On 22 Jul 2010, at 13:57, Harry Halpin wrote: >> >>>>> I think for ETL purposes language >>>>> could have 4 parts. Each except 3) is optional. >>>>> >>>>> 1) Full vendor specific SQL to create a view >>>>> >>>>> 2) A portable subset of SQL to create a view >>>>> >>>>> 3) Mapping of that view to a default graph >>>>> >>>>> 4) Possibly running RDF-to-RDF transforms here (RIF). >>>>> >>>> Where in these 4 do I say that USER.NAME should be mapped to >>>> foaf:name >>>> rather than mydb:USER.NAME? >>>> >>> It seems there are some differences here in the group, but I think 3) >>> would be the right place, i.e. the mapping from SQL to the graph, >>> which >>> seems often to come after creating some kind of view in SQL - as done >>> with full SQL power (1) or Datalog in (2). EricP seems to want to >>> use RIF >>> to modify that (4). >>> >>> That's why I'm tempted to say, let's work on just 3) and assume >>> they'll be >>> a place for 1) and then optionally leave 3) and 4) behind for now. >>> >>> What do you think? >>> >> Doesn't make sense to me. >> >> The "default mapping", be it of a base table or a view or a SQL query >> result, *per definition* allows no further customization. So the >> property names in a default mapping will be whatever your column names >> were, and hence there will be no foaf:Person or foaf:name in the >> resulting RDF. >> >> Note Souri's "SQL-based approach", which consists of 1) plus a >> language for customizing the "glue" that turns SQL result records into >> RDF triples. I read your list as rejecting that approach? >> > > No, of course not. Sorry, that "glue" is what I meant by "mapping" to the > default graph, i.e. not the more restricted definition you gave above > where URIs could not be generated (although we should have that as a > restricted version of (3)). > > In fact, for the next meeting I was hoping to get Souri to work through > his approach in detail for the WG, since I think it's one of the most > pragmatic approaches. Souri, would you be up for it? > > What I'm trying to gather via that list is if we could see if there's an > emerging consensus from the Working Group. I'm starting to feel it's about > time for a poll... > > >>> [T]he idea was that there might be a simple subset of SQL we can >>> guarantee to be portable. Ashok has >>> brought up another well-known vendor defined set of SQL. >>> >> Subsetting is not a useful way of achieving portability between SQL >> implementations. >> >> There is no portable subset of SQL that includes concatenation, for >> example. If you want to generate URIs, you probably need concatenation. >> >> > > I think that was EricP's the argument for a subset plus principled > extensibility ala RIF, as RIF does string concat. But I understand > concerns that people may not want to use RIF, or a Datalog-version of SQL > that restricts them. Any response from Marcelo or Juan here? > > >> We could state that R2ML expects SQL queries in standard (non-dialect) >> SQL, and provide a way for mapping authors to flag that they use a >> different dialect in their mapping file. >> >> >>> However, that does not mean we should restrict people to use that >>> subset. >>> For some people, portability may not be a concern. I'm OK with using >>> anything to transform relational data to the graph as long as >>> implementers >>> actually will implement it and users will use it (this does bring up >>> concerns about any non SQL-based approach), as long as we can >>> guarantee at >>> least subset of it's portability and then if something may not be >>> portable >>> allow it to be clearly defined as such. >>> >> Sounds reasonable to me. >> >> How about adding a "flavour" attribute to the block of SQL, so that >> mapping authors can announce what dialect they are using. A tool can >> check wether it understands that dialect. Optionally, authors could >> even put multiple flavours of the same query side-by-side to make >> their mapping files truly portable; implementations could check the >> available flavours and use the one they understand. >> > > That also sounds very reasonable. > > >> --- strawman syntax --- >> <sql flavour="SQL92">SELECT 'a' || 'b'</sql> (this is standard SQL) >> <sql flavour="MySQL">SELECT CONCAT('a', 'b')</sql> (this is MySQL) >> --- end strawman syntax --- >> > > I was hoping there might be a way for us to check to make sure that you > could have a "default" flavor that was portable and that covered the 80 > part of the 80/20 amount of mappings. Perhaps this is not the case. > > >> This doesn't guarantee portability (we can't), but it allows mapping >> authors to flag wether they use the standard flavour of SQL or a >> vendor-specific dialect. >> >> (In reality, we should use URIs to identify the flavours so that >> vendors can define their own. And there should be a default -- perhaps >> SQL Core 08 as per Ashok's proposal.) >> >> Best, >> Richard >> >> >> >>>>> What this does not bring up is what eric and soeren were really >>>>> wanting to >>>>> do earlier as well, which was SPARQL->SQL mappings. >>>>> >>>> Are you saying that we need separate languages for ETL access and for >>>> SPARQL access to the mapped database? I don't think so; it's the same >>>> language. R2ML should specify how to derive an RDF graph from a >>>> relational DB. How to access that RDF graph (linked data, SPARQL, >>>> ETL, >>>> brainwave transmission) is up to implementations. >>>> >>> I would hope we do not need a separate language for that, but there >>> needs >>> to be a clear statement about that in the spec. >>> >>> >>>> Best, >>>> Richard >>>> >>>> >>>> >>>> >>>> >>>>> However, before descending into the black hole of semantics and >>>>> options, >>>>> Im'm happy to agree to get a rough-draft out on 1) and 3) if people >>>>> can't >>>>> agreee on 2) and 4). >>>>> >>>>> >>>>>> I think there is a clear desire to allow full SQL in a compliant >>>>>> implementation of the SQL-based approach. This is at least what I >>>>>> gather from Souri's and Orri's comments. I can not remember anyone >>>>>> making an argument that only a restricted SQL fragment should be >>>>>> allowed in the SQL-based approach. >>>>>> >>>>>> Can you please explain, or point me to the discussion that >>>>>> motivates >>>>>> the need for restrictions in the allowable SQL in the SQL-based >>>>>> approach? >>>>>> >>>>>> Best, >>>>>> Richard >>>>>> >>>>>> >>>>>> >>>> >>> >> >> > > >
Received on Thursday, 22 July 2010 14:51:50 UTC