Re: Q: ISSUE-42 bNode semantics from Richard Cyganiak on 2011-05-22 (public-rdb2rdf-wg@w3.org from May 2011)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Sun, 22 May 2011 20:52:21 +0100
To: Enrico Franconi <franconi@inf.unibz.it>
Cc: W3C RDB2RDF <public-rdb2rdf-wg@w3.org>
Message-Id: <BA6D597E-5E2A-4E3D-B6E4-72C87B951546@cyganiak.de>
On 22 May 2011, at 18:44, Enrico Franconi wrote:
>> Both of these omit the really interesting case of equality tests (WHERE col1=col2 in SQL; FILTER(?col1=?col2) in SPARQL).
> 
> No. CQs (or SPJ queries) *do* include joins - namely equality.

I meant that col1 and col2 are in the same table. That's not a join, it's a selection in relational algebra terms.

>> The thing is that NULL=NULL is not true in SQL, but "NULL^^rdb2rdf:null="NULL^^rdb2rdf:null is true in SPARQL. (The translation in the wiki is incorrect for this case.)
> 
> That's exactly my argument. And that's why I introduce in the wiki the (?X ≠ NULL) conjunct to let things behave well again. It can be proved that this is enough to recover correctness and completeness in the case of SPJ.

Adding a (?x != "NULL"^^rdb2rdf:null) condition on join variables makes the joins work, but not equality constraints in the general case. You have to re-implement SQL's 3VL to make the general case work. That might well be possible, but it's certainly a bit more complicated than what you have shown so far.

>> The non-null-preserving RDB-to-RDF translation (with a naive query translation) *is* complete and correct for query answering, if one considers only QA over BGPs using SPARQL semantics. (Because where SQL selection semantics allows null values in query results, BGP matching semantics simply rejects such tuples.)
> 
> I also believe this is true (on the basis that a null being an absent value corresponds to a failure of the join), but the whole point is that it doesn't scale up when you make the query language slightly more expressive than CQs (or BGPs) in absence of schema information *and* peculiar translation of the queries. See in the wiki my example with MINUS.

I don't know what you mean here. BGP matching doesn't include MINUS. Neither does SPARQL (at least in 1.0). SPARQL 1.0 has !BOUND, which is easily translated to SQL.

>> I believe that it the non-null-preserving translation, in the presence of schema information, is actually correct and complete for all of SPARQL 1.0.
> 
> That's the tricky bit: "in the presence of schema information"

I don't see why that's tricky -- to map the data, you need schema information anyway; and I don't see how you can translate “SELECT * FROM table1” to SPARQL without schema information either.

> (and I add *and* peculiar translation of the queries).

Sure.

> Maybe yes, maybe not. Personally, I'd try not to reinvent the hot water, and I'd go through the same route as SQL, since this is exactly what we are mapping from.

We are mapping the *data* from relational to RDF. But mapping the *queries* from SPARQL to SQL makes more sense because that is what implementers actually need.

>> I think this can be shown using the literature on SPARQL-to-SQL translation, e.g., [1][2][3].
> 
> Please tell me exactly where to look at.

[1] is a good place to start. It shows how to map all of SPARQL to equivalent SQL queries. It assumes a single-table three-column s,p,o database though, rather than the traditional domain database schemas we are considering here, so it would need some examination on how to adapt it to this 

>> If we had to chose between basing the criteria for correctness on SPARQL (BGP matching) semantics or SQL semantics (like in the proposal on the wiki), then I would argue that SPARQL is more appropriate.
> 
> I don't understand what are you saying here. The correctness is about the QA problem, which involves both SQL and SPARQL.

SQL semantics includes NULL. SPARQL semantics doesn't -- it just has unbound which works differently. Therefore, when translating SPARQL to SQL, dealing with the issue of NULL is easier. Add an IS NOT NULL whenever a variable is bound, and COALESCE on join variables that could potentially be unbound, and that should cover everything.

>> So the expressivity that is ultimately needed is complete SPARQL, and numerous implementations of this have been around for years.
> 
> Of course again. But the issue is how to prove the correctness of the whole endeavour. You don't want really to have a standard without be certain that what you are proposing is correct. It seems to me that NULL values introduce enough complexity for a WG: usually things like these should be already in the practice and the theory before they become standard, and AFAIK NULLs are not in this position.

NULLs may not be in the theory of RDB-to-RDF translation, but they certainly are in practice already. The implementations I'm familiar with do not generate triples for NULL values, and at least none of the 15,000 users who have downloaded D2RQ have complained about that.

>> That being said, I would disagree that query answering is sufficient for establishing the correctness of a mapping. Using only this criterion would potentially allow for multiple different correct and complete mappings.
> 
> Of course, as soon as they all give you the very same undistinguishable answer, I don't see why people can not have their own optimised version of the translation.

Just to be clear: The QA-only criterion would allow for multiple correct and complete mappings that produce *different* results. For example, creating a distinct blank node for each NULL in an input table, instead of "NULL"^^rdb2rdf:null -- I think the query translation in this case would be even easier than in the proposal on the wiki.

And the whole point of the direct mapping effort is to have a single, standard mapping.

>> I think that a further criterion has to be compatibility with the formal RDF semantics.
> 
> But this is impossible with NULL values, it seems to me.

I thought you had agreed above that the null-ingoring mapping is correct with regard to SPARQL BGP matching, and perhaps all of SPARQL 1.0. It satisfies the QA criterion (same responses over the RDF and the RDB), and is compatible with RDF semantics.

> Remember that you are modelling NULL values, which really mean the disjunction between absence of information and the existence of information, by just saying that they mean absence of information.

RDF can't express “there is no value”. So in my eyes, *any* mapping is necessarily incomplete because it cannot capture that notion. But that's not a problem. RDF is monotonic, so simply not translating these cells yields incomplete but correct information. (This is a different notion of completeness and correctness, obviously.)

> While for some cases this may work, in general it is not guaranteed to work, and it wouldn't work when you introduce proper disjunctive information such as in OWL (DL or Lite fragments, however are they called now...).

What do you mean by “would not work”? What problem would there be with OWL?

> I'd be happy to realise that what you are aiming at could be true, but it sounds like a dream...
> I try to be more realistic, and I propose that RDF is just a mean to obtain something sound and complete.
> 
>> Furthermore, when talking about completeness, I submit that given a correct mapping graph G, any graph G' is also correct (but possibly incomplete) if G RDF-entails G'. Under this -- in my eyes very appropriate -- notion of correctness, it is easy to show that the non-null-translating mapping is correct (but incomplete) since it is RDF-entailed by any null-translating mapping.
> 
> Interesting property, but since you push for your own mapping :-) I guess you have to show some (in)formal evidence that this can be achieved.

Well, let's assume your mapping which translates NULL to a special literal is correct.

An RDF graph entails any of its subgraphs. The mapping which ignores NULLs is a subgraph of your mapping for any given database. So your mapping RDF-entails the null-ignoring mapping, which would therefore also be correct.

Best,
Richard


> 
> cheers
> --e.
> 
>> 
>> Best,
>> Richard
>> 
>> [1] Chebotko et al, Semantics preserving SPARQL-to-SQL translation
>>   http://www.sciencedirect.com/science/article/pii/S0169023X09000469
>> [2] Pérez et al, Semantics and complexity of SPARQL
>>   http://portal.acm.org/citation.cfm?id=1567278
>> [3] Cyganiak, A relational algebra for SPARQL
>>   http://www.hpl.hp.com/techreports/2005/HPL-2005-170.pdf
>> 
>> 
>>> On 21 May 2011, at 16:23, Richard Cyganiak <richard@cyganiak.de> wrote:
>>> 
>>>> On 20 May 2011, at 23:04, Enrico Franconi wrote:
>>>>>> Your argument hinges on a claim that one proposal is correct, and another is incorrect. Can you state the criteria that an RDB2RDF mapping has to fulfill so that you consider it correct and complete?
>>>>> 
>>>>> As I say in the wiki, I consider the query answering (QA) problem. A translation from data and queries in RDB/SQL to RDF/SPARQL is sound whenever any tuple returned in the translated QA problem is also returned in the original QA problem; it is complete if any tuple returned in the original QA problem is also returned in the translated QA problem.
>>>> 
>>>> What expressivity of queries does the translation of queries have to cover? SPARQL, SQL or something else?
>>>> 
>>>> Thanks,
>>>> Richard
>>> 
>> 
>
Received on Sunday, 22 May 2011 19:52:53 UTC