Re: Q: ISSUE-42 bNode semantics from Richard Cyganiak on 2011-05-22 (public-rdb2rdf-wg@w3.org from May 2011)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Sun, 22 May 2011 16:48:26 +0100
To: Enrico Franconi <franconi@inf.unibz.it>
Cc: W3C RDB2RDF <public-rdb2rdf-wg@w3.org>
Message-Id: <BF5BBF60-65EA-492F-8543-B662390EECD1@cyganiak.de>
On 21 May 2011, at 15:40, Enrico Franconi wrote:
> Personally I believe that is is already an achievement to cover conjunctive queries, namely select-project-join / positive select-from-where (SQL) and BGPs (SPARQL).
> 
> A slightly more expressive but very interesting case would be relational combinations of CQs (namely union, intersection, difference of CQs).

Both of these omit the really interesting case of equality tests (WHERE col1=col2 in SQL; FILTER(?col1=?col2) in SPARQL). The thing is that NULL=NULL is not true in SQL, but "NULL^^rdb2rdf:null="NULL^^rdb2rdf:null is true in SPARQL. (The translation in the wiki is incorrect for this case.)

The non-null-preserving RDB-to-RDF translation (with a naive query translation) *is* complete and correct for query answering, if one considers only QA over BGPs using SPARQL semantics. (Because where SQL selection semantics allows null values in query results, BGP matching semantics simply rejects such tuples.)

I believe that it the non-null-preserving translation, in the presence of schema information, is actually correct and complete for all of SPARQL 1.0. I think this can be shown using the literature on SPARQL-to-SQL translation, e.g., [1][2][3].

If we had to chose between basing the criteria for correctness on SPARQL (BGP matching) semantics or SQL semantics (like in the proposal on the wiki), then I would argue that SPARQL is more appropriate. The working group's use cases for the direct mapping revolve around evaluating SPARQL queries over relational databases, and not SQL queries over RDF. So the expressivity that is ultimately needed is complete SPARQL, and numerous implementations of this have been around for years.

That being said, I would disagree that query answering is sufficient for establishing the correctness of a mapping. Using only this criterion would potentially allow for multiple different correct and complete mappings. I think that a further criterion has to be compatibility with the formal RDF semantics. Furthermore, when talking about completeness, I submit that given a correct mapping graph G, any graph G' is also correct (but possibly incomplete) if G RDF-entails G'. Under this -- in my eyes very appropriate -- notion of correctness, it is easy to show that the non-null-translating mapping is correct (but incomplete) since it is RDF-entailed by any null-translating mapping.

Best,
Richard

[1] Chebotko et al, Semantics preserving SPARQL-to-SQL translation
    http://www.sciencedirect.com/science/article/pii/S0169023X09000469
[2] Pérez et al, Semantics and complexity of SPARQL
    http://portal.acm.org/citation.cfm?id=1567278
[3] Cyganiak, A relational algebra for SPARQL
    http://www.hpl.hp.com/techreports/2005/HPL-2005-170.pdf


> On 21 May 2011, at 16:23, Richard Cyganiak <richard@cyganiak.de> wrote:
> 
>> On 20 May 2011, at 23:04, Enrico Franconi wrote:
>>>> Your argument hinges on a claim that one proposal is correct, and another is incorrect. Can you state the criteria that an RDB2RDF mapping has to fulfill so that you consider it correct and complete?
>>> 
>>> As I say in the wiki, I consider the query answering (QA) problem. A translation from data and queries in RDB/SQL to RDF/SPARQL is sound whenever any tuple returned in the translated QA problem is also returned in the original QA problem; it is complete if any tuple returned in the original QA problem is also returned in the translated QA problem.
>> 
>> What expressivity of queries does the translation of queries have to cover? SPARQL, SQL or something else?
>> 
>> Thanks,
>> Richard
>
Received on Sunday, 22 May 2011 15:48:56 UTC