Dealing with RDB NULL values (was: Q: ISSUE-41 bNode semantics) from Michael Hausenblas on 2011-05-19 (public-rdb2rdf-wg@w3.org from May 2011)

From: Michael Hausenblas <michael.hausenblas@deri.org>
Date: Thu, 19 May 2011 09:35:38 +0100
To: Enrico Franconi <franconi@inf.unibz.it>
Cc: W3C RDB2RDF <public-rdb2rdf-wg@w3.org>
Message-Id: <483A2A52-F328-416E-BC6F-6B250A94DE1A@deri.org>
Enrico,


> ... having worked on the theory of SQL null values for few years now  
> I am confident (but I can always be proved wrong) that this is the  
> only way to correctly deal with the matter.


Given your expertise, can I please ask you to put your proposal in its  
entire beauty on the WG Wiki? I've created a page already [1] with an  
attempt of a problem description. It would be tremendously helpful for  
us if you could do this before next week's meeting so that we can  
proceed with this issues.

Of course, all other WG members, including Richard, Juan, etc. who  
contributed to the discussion so far are encouraged to edit this page  
as well!

Cheers,
	Michael

[1] http://www.w3.org/2001/sw/rdb2rdf/wiki/RDBNullValues

--
Dr. Michael Hausenblas, Research Fellow
LiDRC - Linked Data Research Centre
DERI - Digital Enterprise Research Institute
NUIG - National University of Ireland, Galway
Ireland, Europe
Tel. +353 91 495730
http://linkeddata.deri.ie/
http://sw-app.org/about.html

On 19 May 2011, at 07:41, Enrico Franconi wrote:

> On 19 May 2011, at 01:09, Richard Cyganiak wrote:
>
>> On 18 May 2011, at 21:22, Enrico Franconi wrote:
>>> (1) you have no evidence that "the semantics of RDF offers no  
>>> construct that adequately covers this"
>>
>> I do have evidence:
>> http://www.w3.org/TR/rdf-mt/
>>
>> I read it all and nothing in there works for this.
>
> The burden is on you to show that :-)
>
>> Burden of proof is on you now I'd say ;-)
>
> Indeed I proposed how to "use" unmodified standard normative RDF +  
> SPARQL to deal correctly with NULL values.
>
>>> (2) but given this, you then decide to deal anyway with NULL  
>>> values, and you give a (already shown incorrect) mapping for them?
>>
>> Perhaps incomplete, but not incorrect.
>
> According to a standard notion of incorrectness your proposal is  
> incorrect (and incomplete, of course).
> Indeed, in my example <http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2011May/0111.html 
> > if you query DB1 for all the tuples in R which do NOT have any  
> value for the attribute A (a query you can easily write as the  
> difference between all the tuples in R and the tuples in R which do  
> have some value for A) you will get both tuples by using your  
> mapping which neglects the NULL values. So you get spurious wrong  
> tuples in the answer. And this is plain unsound.
>
>>> When a future WG will fix this, the change will be most likely non  
>>> backward compatible, since now you are taking a non-motivated  
>>> choice for the mapping of NULLs. And this is bad.
>>
>> As long as we translate the non-null parts correctly, our  
>> translation is correct.
>
> As long you translate only RDBs which do not contain any NULL value,  
> your translation is correct. Indeed, this is my proposal to the  
> group in the case you do not want to explore the possibility to  
> correctly deal with NULLs.
>
>> It may not be complete because the presence of a null may give some  
>> extra information that we do not capture.
>
> As I just said, it is incorrect (unsound) *and* incomplete. So it is  
> just wrong.
>
>> However, if some clever spirit in the future should find some way  
>> of squeezing that extra information into RDF triples, then these  
>> extra triples can always be added without breaking backwards  
>> compatibility. RDF is monotonic.
>
> Since you are unsound, monotonicity does not help you to preserve  
> backward compatibility.
> Look: the issue is complex, there is no reason to fight over this.  
> The facts are that your proposal is wrong and my proposal is not  
> trivial to analyse.
>
>>> If the majority of this WG does not want to explore the  
>>> correctness of the mapping for NULL values, them my proposal would  
>>> be something along the lines:
>>> "Note: if a relational database contains NULL values, then the  
>>> direct mapping is not applicable. This case is postponed for  
>>> consideration to a future WG."
>>
>> I repeat: Not mapping nulls may be incomplete, but not incorrect.  
>> Punting on the NULL cells doesn't mean that we have to punt on the  
>> entire DB.
>
> I repeat: no.
>
>>> I propose to translate a NULL value as a special constant from a  
>>> special datatype, and to understand how SPARQL 1.0 queries should  
>>> be modified in order to behave properly in presence of RDF data  
>>> coming from a direct mapping of a RDB with NULL values. My guess  
>>> is that it is enough to enrich the BGP part with a conjunct NOT- 
>>> EQUAL(X,'NULL') [pardon my naive syntax here] for each joined  
>>> (namely repeated in the BGP) variable X, so we remain in pure  
>>> SPARQL 1.0.
>>
>> I'll not comment on charter scope here, I'll leave that to others.
>>
>> You talk about changing SPARQL.
>
> No. I'm talking about using unmodified standard normative SPARQL.
>
>> But let me just say that you'd have to go deeper than that.
>>
>> If we have this:
>>
>> <Alice> <name> "NULL"^^rdb2rdf:NULL .
>> <Bob>   <name> "NULL"^^rdb2rdf:NULL .
>>
>> then from RDF Semantics it follows that <Alice> and <Bob> have the  
>> same name. A constant is a constant, that's how datatypes work in  
>> RDF and you can't do anything about that unless you are prepared to  
>> change the formal semantics. If you need a walkthrough of the spec  
>> to see why this is true, let me know.
>
> I was in the SPARQL WG, so I don't need you to walk me through.
> Please look again at my proposal, and realise that I am not  
> proposing to change anything in SPARQL (or RIF or OWL). I am  
> proposing a recipe on how to *use* unmodified standard normative RDF  
> and SPARQL 1.0 in order to get the correct answers over the graph  
> obtained by a direct mapping where the NULL value is translated as a  
> special recognisable constant.
> It may well be the case that the WG and the SW community at large  
> does not like this approach (since it requires to adhere to a recipe  
> in order to get the right answers, á la 'best practices') but having  
> worked on the theory of SQL null values for few years now I am  
> confident (but I can always be proved wrong) that this is the only  
> way to correctly deal with the matter.
>
>> It's not just SPARQL that's built on that foundation, but also OWL2  
>> and RIF. Are you prepared to change them as well?
>
> No I am not :-)
>
> cheers
> --e.
>
>> Best,
>> Richard
>
Received on Thursday, 19 May 2011 08:36:10 UTC