Re: Q: ISSUE-41 bNode semantics from Enrico Franconi on 2011-05-19 (public-rdb2rdf-wg@w3.org from May 2011)

From: Enrico Franconi <franconi@inf.unibz.it>
Date: Thu, 19 May 2011 15:50:28 +0200
To: Richard Cyganiak <richard@cyganiak.de>
Cc: Ivan Herman <ivan@w3.org>, Pat Hayes <phayes@ihmc.us>, Michael Hausenblas <michael.hausenblas@deri.org>, W3C RDB2RDF <public-rdb2rdf-wg@w3.org>
Message-Id: <4D54D92E-580F-4AF6-B51D-5AAD02A2D0E4@inf.unibz.it>
On 19 May 2011, at 14:03, Richard Cyganiak wrote:

>> If we have this:
>> 
>> <Alice> <name> "NULL"^^rdb2rdf:NULL .
>> <Bob>   <name> "NULL"^^rdb2rdf:NULL .
>> 
>> then from RDF Semantics it follows that <Alice> and <Bob> have the same name. A constant is a constant, that's how datatypes work in RDF and you can't do anything about that unless you are prepared to change the formal semantics.
> 
> Your proposal does not work because of the technical problem above.

You appear to fall into the following category of people:

On 18 May 2011, at 22:22, Enrico Franconi wrote:
> It may well be the case that the WG and the SW community at large does not like this approach (since it requires to adhere to a recipe in order to get the right answers, á la 'best practices').

so you should be happy with my second proposal:

On 18 May 2011, at 22:22, Enrico Franconi wrote:
> "Note: if a relational database contains NULL values, then the direct mapping is not applicable. This case is postponed for consideration to a future WG."

You don't believe in my fix:

> You try to fix it up by layering a custom query answering apparatus on top of the RDF. That does not work because RDF triples themselves already have a formal semantics, and your proposal is incompatible with this formal semantics. It does not allow the use of a typed literal for any other purpose than denoting an entity.

I believe that the thing we can do to deal with NULLs is to layer a best practice on top of the translation. It is ugly, I agree, but any other solution which does not do this has been proved wrong.

>>>> (2) but given this, you then decide to deal anyway with NULL values, and you give a (already shown incorrect) mapping for them?
>>> 
>>> Perhaps incomplete, but not incorrect.
>> 
>> According to a standard notion of incorrectness your proposal is incorrect (and incomplete, of course).
>> Indeed, in my example <http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2011May/0111.html> if you query DB1 for all the tuples in R which do NOT have any value for the attribute A (a query you can easily write as the difference between all the tuples in R and the tuples in R which do have some value for A) you will get both tuples by using your mapping which neglects the NULL values. So you get spurious wrong tuples in the answer. And this is plain unsound.
> 
> You defined your own non-standard query mechanism, declared that it should return X, showed that it does not return X if applied to the graph produced by the Direct Mapping, and hence declared the Direct Mapping unsound. This doesn't show anything. The standard query language for RDF is SPARQL.
> 
> And a mapping from some other data model instance to an RDF graph is sound even if it is incomplete -- that is, it does not capture all the knowledge encoded in the data model instance. That's the open world assumption. Of course if you apply a closed-world query language that supports some form of negation or set difference to an open-world logic, you can produce spurious triples -- but that doesn't mean that the mapping is unsound.

Apparently we are applying the notion of 'soundness' to different contexts. 
Suppose I have a RDB which gives me some answers when queried in SQL. I translate the RDB in RDF, and the SQL query in SPARQL (with a direct translation). In absence of NULL values, I get always the same answers from both the SQL and the SPARQL. This somehow proves that the translation is meaningful - indeed we say that the translation of the original query answering problem from RDB/SQL to RDF/SPARQL is sound and complete.
However, when NULL values are involved, thee are cases when the SPARQL query gives me less or spurious tuples. In this case we say that the translation of the original query answering problem from RDB/SQL to RDF/SPARQL is incomplete and/or unsound.

Please check my story on the Wiki, where it has been summarised hopefully in a coherent and clearer way.
Basically, my argument is the following:

1) I start with some RDB with NULL values;

2) I show some queries and their answer according to the standard semantics of NULL values in SQL;

3) I introduce two different proposals for a mapping from the RDB to RDF triples;

4) I show how the answers of the queries considered in (2) are different from the expected ones; the first one differs from the expected one in an incomplete and unsound way - namely it returns sometimes fewer tuples and sometimes more tuples;

5) I conclude that the proposals are wrong;

6) I introduce a third proposal, which in addition requires that the original queries should be also mapped in a mechanical (and very easy) way;

7) I show how this latter proposal does the right job;

8) If you do not like the third proposal, then I conclude that we should not deal with NULL values at all, unless we find a working proposal.

>>> If you need a walkthrough of the spec to see why this is true, let me know.
>> 
>> I was in the SPARQL WG, so I don't need you to walk me through.
> 
> But did you read the RDF Semantics document?
> http://www.w3.org/TR/rdf-mt/

I don't see a smiley here, so I take that you are serious in asking. I guess I already replied to this very specific question <http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2011May/0113.html>.

> If you did, then you can surely speak to the very specific technical problem with your proposal that I pointed out above?

For the n-th time, try to follow me. Tell me what do you specifically do not understand of my argument.

cheers
--e.

> 
> Best,
> Richard
> 
> 
> 
> On 19 May 2011, at 07:41, Enrico Franconi wrote:
> 
>> On 19 May 2011, at 01:09, Richard Cyganiak wrote:
>> 
>>> On 18 May 2011, at 21:22, Enrico Franconi wrote:
>>>> (1) you have no evidence that "the semantics of RDF offers no construct that adequately covers this"
>>> 
>>> I do have evidence:
>>> http://www.w3.org/TR/rdf-mt/
>>> 
>>> I read it all and nothing in there works for this.
>> 
>> The burden is on you to show that :-)
>> 
>>> Burden of proof is on you now I'd say ;-)
>> 
>> Indeed I proposed how to "use" unmodified standard normative RDF + SPARQL to deal correctly with NULL values.
>> 
>>>> (2) but given this, you then decide to deal anyway with NULL values, and you give a (already shown incorrect) mapping for them?
>>> 
>>> Perhaps incomplete, but not incorrect.
>> 
>> According to a standard notion of incorrectness your proposal is incorrect (and incomplete, of course).
>> Indeed, in my example <http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2011May/0111.html> if you query DB1 for all the tuples in R which do NOT have any value for the attribute A (a query you can easily write as the difference between all the tuples in R and the tuples in R which do have some value for A) you will get both tuples by using your mapping which neglects the NULL values. So you get spurious wrong tuples in the answer. And this is plain unsound.
>> 
>>>> When a future WG will fix this, the change will be most likely non backward compatible, since now you are taking a non-motivated choice for the mapping of NULLs. And this is bad.
>>> 
>>> As long as we translate the non-null parts correctly, our translation is correct.
>> 
>> As long you translate only RDBs which do not contain any NULL value, your translation is correct. Indeed, this is my proposal to the group in the case you do not want to explore the possibility to correctly deal with NULLs.
>> 
>>> It may not be complete because the presence of a null may give some extra information that we do not capture.
>> 
>> As I just said, it is incorrect (unsound) *and* incomplete. So it is just wrong.
>> 
>>> However, if some clever spirit in the future should find some way of squeezing that extra information into RDF triples, then these extra triples can always be added without breaking backwards compatibility. RDF is monotonic.
>> 
>> Since you are unsound, monotonicity does not help you to preserve backward compatibility.
>> Look: the issue is complex, there is no reason to fight over this. The facts are that your proposal is wrong and my proposal is not trivial to analyse.
>> 
>>>> If the majority of this WG does not want to explore the correctness of the mapping for NULL values, them my proposal would be something along the lines:
>>>> "Note: if a relational database contains NULL values, then the direct mapping is not applicable. This case is postponed for consideration to a future WG."
>>> 
>>> I repeat: Not mapping nulls may be incomplete, but not incorrect. Punting on the NULL cells doesn't mean that we have to punt on the entire DB.
>> 
>> I repeat: no.
>> 
>>>> I propose to translate a NULL value as a special constant from a special datatype, and to understand how SPARQL 1.0 queries should be modified in order to behave properly in presence of RDF data coming from a direct mapping of a RDB with NULL values. My guess is that it is enough to enrich the BGP part with a conjunct NOT-EQUAL(X,'NULL') [pardon my naive syntax here] for each joined (namely repeated in the BGP) variable X, so we remain in pure SPARQL 1.0.
>>> 
>>> I'll not comment on charter scope here, I'll leave that to others.
>>> 
>>> You talk about changing SPARQL.
>> 
>> No. I'm talking about using unmodified standard normative SPARQL.
>> 
>>> But let me just say that you'd have to go deeper than that.
>>> 
>>> If we have this:
>>> 
>>> <Alice> <name> "NULL"^^rdb2rdf:NULL .
>>> <Bob>   <name> "NULL"^^rdb2rdf:NULL .
>>> 
>>> then from RDF Semantics it follows that <Alice> and <Bob> have the same name. A constant is a constant, that's how datatypes work in RDF and you can't do anything about that unless you are prepared to change the formal semantics. If you need a walkthrough of the spec to see why this is true, let me know.
>> 
>> I was in the SPARQL WG, so I don't need you to walk me through.
>> Please look again at my proposal, and realise that I am not proposing to change anything in SPARQL (or RIF or OWL). I am proposing a recipe on how to *use* unmodified standard normative RDF and SPARQL 1.0 in order to get the correct answers over the graph obtained by a direct mapping where the NULL value is translated as a special recognisable constant.
>> It may well be the case that the WG and the SW community at large does not like this approach (since it requires to adhere to a recipe in order to get the right answers, á la 'best practices') but having worked on the theory of SQL null values for few years now I am confident (but I can always be proved wrong) that this is the only way to correctly deal with the matter.
>> 
>>> It's not just SPARQL that's built on that foundation, but also OWL2 and RIF. Are you prepared to change them as well?
>> 
>> No I am not :-)
>> 
>> cheers
>> --e.
>> 
>>> Best,
>>> Richard
>> 
>> 
> 
>
Received on Thursday, 19 May 2011 13:52:22 UTC