Re: Q: ISSUE-42 bNode semantics

On 23 May 2011, at 13:14, Richard Cyganiak wrote:

> You did not quite answer my question. You say here that only translating any SQL queries to SPARQL can show correctness of the direct mapping. I ask why translating any SPARQL query to SQL (which is a well-studied problem) is not sufficient.

You are oversimplifying. I did not say that.
Le me explain again by sketching how this is applied for the first two steps. If this is still not clear, please tell me exactly what you do not understand.
(1) Since I know that the normative behaviour of the CQ fragment of SQL (where the NULLs are treated as constants) is to treat the NULL constants and the to let the joins fail over any occurrence of the constant NULL, I can immediately say that I can write CQs in SPARQL as BGPs (well know) but I have to add the said inequalities.
(2) Since I know that if I extend the above fragments to union/intersection/difference between CQs then its normative behaviour is defined in the specs by specifying how union/intersection/difference work in the case they encounter a NULL constant (namely they behave as if they were normal constants). So, I can easily reproduce this to the  fragment of SPARQL including union/intersection/difference between BGPs, where again queries are just the original ones with the inequalities in the joined variables. This fragment does not include still full RA (and therefore full SPARQL), since there are some nasty cases of nested SQL queries which can not be covered by this fragment.

Since NULL values are only defined in SQL and only by describing how the various operators are affected by this special constant, I don't see how can we proceed otherwise, but I am open to any proposal.

>>> Furthermore, the peer-reviewed literature contains translations of all SPARQL 1.0 algebra operators to SQL.
>> I said in my previous email that this does not really help in this specific thread, since we are discussing about NULLs with SQL semantics, and SPARQL does not provide any help about that.
> See above. I claim that for practical purposes, showing a general translation from SPARQL to SQL is sufficient to show correctness of a direct mapping,

You never told me explicitly how this claim is supported. Can you please sketch how would the entire process work and how you prove correctness?

>>> Why do you doubt the existence of a translation from SPARQL 1.0 to SQL that preserves results over the null-ignoring direct mapping?
>> You simply fail to tell me how to find such a translation, so we can not presume its existence.
> You can find such a translation by following the steps in Chebotko et al.

Again and again and again: there are no null values there, so how can this be useful?

>> I showed an example (the one with MINUS) where a naive translation does not work, and I still have to see from your side a general way to fix it.
> MINUS does not exist in SPARQL 1.0. You say that it can be expressed in SPARQL 1.0. Can you tell me how? Then I'll show how this would be translated to SQL according to Chebotko et al.

We all well know that SPARQL has the same expressivity as RA:
The Expressive Power of SPARQL, by R. Angles and C. Gutierrez; ISWC 2008.
Well, in SQL this would be hopefully mapped back as just... MINUS! So, as you see, this doesn't help.
Please note that MINUS is native in SPARQL 1.1.

>> Maybe it exists, but I don't see technically how can we proceed in finding it in its generality and prove its correctness, while I did provide a way in the case of materialised NULLs. I have nothing against it, but I need that the proposers of this alternative show us how to proceed.
> I gave a reference to a peer-reviewed journal paper that contains general translations of SPARQL relational algebra operators to SQL. I don't know what more you expect me to do here!

Again and again and again and again: there are no null values there, so how can this be useful?

> So let me ask a question with a weaker claim:
> Let's assume G entails G'. Hence, G captures all the meaning of G'. Nevertheless, queries can produce completely different results when SPARQL queries are evaluated against the two graphs. In light of this, how can you suggest the comparison of query answering results as appropriate to establish the soundness of operations over RDF?

I don't understand. If G is the starting (standard, null-free) database, then G' is a subset of it, and so we loose arbitrary information (we delete tuples); if G' is the starting database, then G is a superset of it, and we add arbitrary information (we add tuples). Why are you interested in the behaviour of QA with an arbitrary change of the database? As we know, adding a tuple to a database does not imply that I get a bigger (or a smaller) answer; nor by deleting a tuple. How can you pretend that something nice happens at the level of the RDF translation?

I'm saying that we have to live with the fact that also in your case of null-ignoring mapping, the RDF you obtain is basically just a passive binary relational structure which can be queried only by specific kind of SPARQL queries in order for the answer to be the ones we would get from the original data by an "analogous" SQL query. If we restrict our attention to BGP queries, the analogous SQL query is exactly the same query (modulo back-reification). In the case of NULLs-as-constants mapping, and if we restrict our attention to BGP queries, the analogous SQL query is the same query conjoined with not-null check of the joined variables (modulo back-reification).
If we extend the expressivity to booleans over BGPs, in the case of NULLs-as-constants mapping I know what to do (see above). 
I still don't know what to do in the null-ignoring mapping case; can you tell how would you handle extensions of BGPs?


Received on Monday, 23 May 2011 13:30:03 UTC