Re: Q: ISSUE-42 bNode semantics from Enrico Franconi on 2011-05-22 (public-rdb2rdf-wg@w3.org from May 2011)

From: Enrico Franconi <franconi@inf.unibz.it>
Date: Sun, 22 May 2011 19:44:28 +0200
To: Richard Cyganiak <richard@cyganiak.de>
Cc: W3C RDB2RDF <public-rdb2rdf-wg@w3.org>
Message-Id: <8C2A872A-3D8B-4063-99D4-357A57B112BB@inf.unibz.it>
On 22 May 2011, at 17:48, Richard Cyganiak wrote:

> On 21 May 2011, at 15:40, Enrico Franconi wrote:
>> Personally I believe that is is already an achievement to cover conjunctive queries, namely select-project-join / positive select-from-where (SQL) and BGPs (SPARQL).
>> 
>> A slightly more expressive but very interesting case would be relational combinations of CQs (namely union, intersection, difference of CQs).
> 
> Both of these omit the really interesting case of equality tests (WHERE col1=col2 in SQL; FILTER(?col1=?col2) in SPARQL).

No. CQs (or SPJ queries) *do* include joins - namely equality.

> The thing is that NULL=NULL is not true in SQL, but "NULL^^rdb2rdf:null="NULL^^rdb2rdf:null is true in SPARQL. (The translation in the wiki is incorrect for this case.)

That's exactly my argument. And that's why I introduce in the wiki the (?X ≠ NULL) conjunct to let things behave well again. It can be proved that this is enough to recover correctness and completeness in the case of SPJ.

> The non-null-preserving RDB-to-RDF translation (with a naive query translation) *is* complete and correct for query answering, if one considers only QA over BGPs using SPARQL semantics. (Because where SQL selection semantics allows null values in query results, BGP matching semantics simply rejects such tuples.)

I also believe this is true (on the basis that a null being an absent value corresponds to a failure of the join), but the whole point is that it doesn't scale up when you make the query language slightly more expressive than CQs (or BGPs) in absence of schema information *and* peculiar translation of the queries. See in the wiki my example with MINUS.

> I believe that it the non-null-preserving translation, in the presence of schema information, is actually correct and complete for all of SPARQL 1.0.

That's the tricky bit: "in the presence of schema information" (and I add *and* peculiar translation of the queries). Maybe yes, maybe not. Personally, I'd try not to reinvent the hot water, and I'd go through the same route as SQL, since this is exactly what we are mapping from.

> I think this can be shown using the literature on SPARQL-to-SQL translation, e.g., [1][2][3].

Please tell me exactly where to look at.

> If we had to chose between basing the criteria for correctness on SPARQL (BGP matching) semantics or SQL semantics (like in the proposal on the wiki), then I would argue that SPARQL is more appropriate.

I don't understand what are you saying here. The correctness is about the QA problem, which involves both SQL and SPARQL.

> The working group's use cases for the direct mapping revolve around evaluating SPARQL queries over relational databases, and not SQL queries over RDF.

Of course. Or even better: SPARQL over RDF coming from a translation of a SQL RDB.

> So the expressivity that is ultimately needed is complete SPARQL, and numerous implementations of this have been around for years.

Of course again. But the issue is how to prove the correctness of the whole endeavour. You don't want really to have a standard without be certain that what you are proposing is correct. It seems to me that NULL values introduce enough complexity for a WG: usually things like these should be already in the practice and the theory before they become standard, and AFAIK NULLs are not in this position.

> That being said, I would disagree that query answering is sufficient for establishing the correctness of a mapping. Using only this criterion would potentially allow for multiple different correct and complete mappings.

Of course, as soon as they all give you the very same undistinguishable answer, I don't see why people can not have their own optimised version of the translation.

> I think that a further criterion has to be compatibility with the formal RDF semantics.

But this is impossible with NULL values, it seems to me. Remember that you are modelling NULL values, which really mean the disjunction between absence of information and the existence of information, by just saying that they mean absence of information. While for some cases this may work, in general it is not guaranteed to work, and it wouldn't work when you introduce proper disjunctive information such as in OWL (DL or Lite fragments, however are they called now...).
I'd be happy to realise that what you are aiming at could be true, but it sounds like a dream...
I try to be more realistic, and I propose that RDF is just a mean to obtain something sound and complete.

> Furthermore, when talking about completeness, I submit that given a correct mapping graph G, any graph G' is also correct (but possibly incomplete) if G RDF-entails G'. Under this -- in my eyes very appropriate -- notion of correctness, it is easy to show that the non-null-translating mapping is correct (but incomplete) since it is RDF-entailed by any null-translating mapping.

Interesting property, but since you push for your own mapping :-) I guess you have to show some (in)formal evidence that this can be achieved.

cheers
--e.

> 
> Best,
> Richard
> 
> [1] Chebotko et al, Semantics preserving SPARQL-to-SQL translation
>    http://www.sciencedirect.com/science/article/pii/S0169023X09000469
> [2] Pérez et al, Semantics and complexity of SPARQL
>    http://portal.acm.org/citation.cfm?id=1567278
> [3] Cyganiak, A relational algebra for SPARQL
>    http://www.hpl.hp.com/techreports/2005/HPL-2005-170.pdf
> 
> 
>> On 21 May 2011, at 16:23, Richard Cyganiak <richard@cyganiak.de> wrote:
>> 
>>> On 20 May 2011, at 23:04, Enrico Franconi wrote:
>>>>> Your argument hinges on a claim that one proposal is correct, and another is incorrect. Can you state the criteria that an RDB2RDF mapping has to fulfill so that you consider it correct and complete?
>>>> 
>>>> As I say in the wiki, I consider the query answering (QA) problem. A translation from data and queries in RDB/SQL to RDF/SPARQL is sound whenever any tuple returned in the translated QA problem is also returned in the original QA problem; it is complete if any tuple returned in the original QA problem is also returned in the translated QA problem.
>>> 
>>> What expressivity of queries does the translation of queries have to cover? SPARQL, SQL or something else?
>>> 
>>> Thanks,
>>> Richard
>> 
>
Received on Sunday, 22 May 2011 17:44:59 UTC