Re: Re-opening ISSUE-22 on vendor-specific SQL

* Richard Cyganiak <richard@cyganiak.de> [2011-05-31 21:57+0100]
> On 31 May 2011, at 20:46, David McNeil wrote:
> > Although my fear is that most real-world mapping will have some vendor specific SQL in them
> 
> I absolutely agree.
> 
> > and therefore most real-world mappings will have "no particular behavior" defined.
> 
> Well, I see what you mean, but I don't see a good alternative.
> 
> I wouldn't want to specify the behaviour of an R2RML engine where all SQL-carrying properties (rr:SQLQuery, rr:table, rr:tableOwner, rr:column, rr:template and so on) can contain opaque blobs. If anyone asked us to nail down the spec, we'd quickly have to give up.
> 
> Implementer: “Ok, I'm trying to be conformant. So, what is the semantics of rr:SQLQuery if it's flagged as vendor-specific?”
> 
> Me: “You treat it as opaque and pass it to the RDBMS you're connected to. We assume that you get back a tabular result set.”

I have to disagree at this early stage. I can see a bunch of ways that
knowing which language appeared in the ""s will improve performance
and user experience:

R2RML:
  _:someTriplesMap
    rr:languageId <http://oracle.example/OracleSQL10> ;
    rr:SQLQuery """SELECT…('http://example.com/emp/job/'|| "job") AS "jobTypeURI"…""" .

SPARQL Query:
  SELECT ?job WHERE { ?emp <EMP#jobtype> ?jt } LIMIT 5

A naive implementation will locally materialize the entire table and
apply the limit to the results. The more intimate implementation will
push the limit down:

Oracle SQL Query:
  SELECT ('http://example.com/emp/job/'|| "job") AS "jt"
   WHERE row_num <= 5

The same program may be designed to work with multiple databases,
necessarily generating a different queries to handle vender differences:

R2RML:
  _:someTriplesMap
    rr:languageId <http://oracle.example/MySQL6> ;
    rr:SQLQuery """SELECT…('http://example.com/emp/job/'|| "job") AS "jobTypeURI"…""" .

MySQL SQL Query:
  SELECT CONCAT('http://example.com/emp/job/', "job") AS "jt"
   LIMIT 5


The above example avoids shipping a full column of the table. We can
also avoid table scans in the case of reversable functions over
fields.

SPARQL Query:
  SELECT ?emp WHERE { ?emp <EMP#jobtype> <http://example.com/emp/job/CLERK> }

If we correlate the functions in the SQL with their inverses, we can
trivially make use of indexes. In this case, the inverse maps the
substring from 27 to the end of the string, i.e. "CLERK", to job:

SQL Query:
  SELECT CONCAT('http://example.com/emp/', "empno") AS "jt"
   WHERE job = "CLERK"

Having a language identifier also allows the R2RML tool to parse and
locally execute the rr:inverseExpression.


> Implementer: “What about rr:column?”
> 
> Me: “It still refers to a column name in the result, like before.”
> 
> Implementer: “What characters can rr:column contain?”
> 
> Me: “That depends on the SQL dialect. It must be a column that occurs in the query result.”
> 
> Implementer: “Is it case-sensitive?”
> 
> Me: “Again, that may depend on the SQL dialect.”
> 
> Implementer: “So, if the query result has a column with different case, is that an error, or do I use that column's value to generate a triple?”
> 
> Me: “It depends on the dialect.”
> 
> Implementer: “But I have to implement that one way or the other.”
> 
> Me: “You'll have to check the documentation of the RDBMS engine you're connected to, and implement it accordingly. That's the price you pay for supporting other SQL dialects.”
> 
> Implementer: “So you mean your so-called specification does not specify it?”
> 
> Me: “... sigh ...”
> 
> 
> > That seems a bit startling for a spec, but maybe I could get used to it :)
> 
> The thing is, implementers will know what to do anyways, it's just impossible for us to write it down normatively in the spec. We could add some handwaving text that explains the intuition of handling vendor-specific SQL in the appropriate way, but I don't see how that's any better than just saying, “We define R2RML for SQL 2008 Core, full stop. You use another dialect, it's up to you to figure out the details.”
> 
> Can you think of some words that work better for you than the proposed ones?
> 
> Best,
> Richard

-- 
-ericP

Received on Wednesday, 1 June 2011 16:12:19 UTC