Re: Q: ISSUE-41 bNode semantics from Enrico Franconi on 2011-05-18 (public-rdb2rdf-wg@w3.org from May 2011)

From: Enrico Franconi <franconi@inf.unibz.it>
Date: Wed, 18 May 2011 22:22:21 +0200
To: Richard Cyganiak <richard@cyganiak.de>
Cc: Ivan Herman <ivan@w3.org>, Pat Hayes <phayes@ihmc.us>, Michael Hausenblas <michael.hausenblas@deri.org>, W3C RDB2RDF <public-rdb2rdf-wg@w3.org>
Message-Id: <AFBA20FA-A3E9-4E12-BD6C-5112931AD7B0@inf.unibz.it>
On 18 May 2011, at 20:33, Richard Cyganiak wrote:

> On 18 May 2011, at 16:05, Enrico Franconi wrote:
>> If this is the attitude of the majority of the group, I will then request that the mapping explicitly does not deal with RDBs having NULL values, and that will produce results incompatible with the RDB semantics for NULL values otherwise. If you do not say this explicitly, then this group will produce a standard that will not be backward compatible whenever in the future you will have to deal with NULL values.
>> My opinion is that we have to consider NULL values, since I barely know real world RDBs without NULL values.
> 
> I propose to resolve the issue by adding text like this to the direct mapping, close to the point where the behaviour of null-valued columns is specified.:
> 
> “Note: Null values in relational databases can be used to express different notions, including the absence of a value, or the notion that a value exists but is unknown. The semantics of RDF offers no construct that adequately covers this. Therefore, the direct mapping does not assert a triple for a null value. This comes close in intuition, but not the same exact semantics.”

No. This is exactly what I do not want. 

(1) you have no evidence that "the semantics of RDF offers no construct that adequately covers this" -- as much as the relational model does not offer any construct that adequately covers this; but, wait... this is called NULL value :-)

(2) but given this, you then decide to deal anyway with NULL values, and you give a (already shown incorrect) mapping for them? Your "therefore" does not show any causal relation to me; I do not have the "intuition" you have; I don't even see a semantics here, and so I wouldn't know how can you say "not the same exact semantics". When a future WG will fix this, the change will be most likely non backward compatible, since now you are taking a non-motivated choice for the mapping of NULLs. And this is bad.

If the majority of this WG does not want to explore the correctness of the mapping for NULL values, them my proposal would be something along the lines:
"Note: if a relational database contains NULL values, then the direct mapping is not applicable. This case is postponed for consideration to a future WG."

>> I don't understand on which grounds you claim that this can not be done - at least in a simplified context.
> 
> A NULL in the MANAGER column can mean that the person has no manager (she's the CEO). You cannot express the fact that a person has no manager in RDF because RDF semantics does not have negation. But the NULL could also mean that the person's manager is unknown (it's a CRM database and the salesperson didn't ask her.) RDF can express that -- use a blank node. But the direct mapping doesn't know which NULL semantics is intended in a given database.

Indeed, that's why RDBs have NULL values in first place: to express this ambiguity.

> Therefore, it's impossible to accurately express SQL's NULL in RDF;

Uh? (see below)

> you either need additional out-of-band information (like an R2RML mapping specified by someone who knows the NULL semantics used in the given schema); or you need to change the semantics of RDF and SPARQL.

There is a third way in the case the WG decides to explore a correct mapping with NULL values. I propose to translate a NULL value as a special constant from a special datatype, and to understand how SPARQL 1.0 queries should be modified in order to behave properly in presence of RDF data coming from a direct mapping of a RDB with NULL values. My guess is that it is enough to enrich the BGP part with a conjunct NOT-EQUAL(X,'NULL') [pardon my naive syntax here] for each joined (namely repeated in the BGP) variable X, so we remain in pure SPARQL 1.0.

Please note that this is exactly what happens in RDBs: the standard relational model does not provide a special semantics for NULL values; NULL values come into play since they influence the query evaluation, roughly in the way I was saying above.

It may be the case that my proposal is too "complex" for the scope and the time-frame of this WG; in this case I propose to stick to my first statement, in order avoid damages in the applicative world when we have to revise the specs to deal with NULL values in the future:
"Note: if a relational database contains NULL values, then the direct mapping is not applicable. This case is postponed for consideration to a future WG."

>> I strongly disagree on your statement on CWA and 3-valued logic; indeed relational algebra does deal with SQL NULLs by introducing the is-not-null predicate.
> 
> I'll respond when you've made a concrete proposal.

See above.

cheers
--e.
Received on Wednesday, 18 May 2011 20:22:52 UTC