Re: Agenda for June 14 Telcon - Revision 1 from Enrico Franconi on 2011-06-13 (public-rdb2rdf-wg@w3.org from June 2011)

From: Enrico Franconi <franconi@inf.unibz.it>
Date: Mon, 13 Jun 2011 21:41:17 +0200
To: Eric Prud'hommeaux <eric@w3.org>
Cc: ashok.malhotra@oracle.com, public-rdb2rdf-wg@w3.org
Message-Id: <E55CFC01-8B7A-4D31-9CB5-60D8B214A06D@inf.unibz.it>

On 13 Jun 2011, at 20:19, Eric Prud'hommeaux wrote:

> Examining use case like:
> 
> ┌┤Conctacts├─────┐      Direct Graph:
> │ name │ company │      <Conctacts/name=Bob> <Conctacts#name> "Bob" ;
> ├──────┼─────────┤                           <Conctacts#company> "BobCo" .
> │  Bob │   BobCo │      <Conctacts/name=Sue> <Conctacts#name> "Sue" .
> │  Sue │    NULL │
> └──────┴─────────┘
> 
> What companies does Sue represent:
> SELECT company          SELECT ?company        
>  FROM Conctacts         WHERE { ?sue <Conctacts#name> "Sue" ;     
> WHERE name="Sue"                     <Conctacts#company> ?company }

Did people read what I wrote in the Wiki?
That's my example (a). In my example (a) in the wiki I show that the above solution is not desirable, since (1) it does not distinguish the answers between your Contacts table above and a similar Contacts1 table where the second tuple is absent, and (2) it is not compliant with SQL semantics, which returns (as I show in the wiki) in the case of Contacts two bindings for ?company, namely "BobCo" and NULL, while in the case of Contacts1 it returns the first binding only.

That's why I am saying "This mapping for NULL values is arbitrary since the WG has left unexplored its relationship with the original meaning and behaviour of NULL values in the source RDB."

> How many people represent BobCo? (we don't know if Sue does)
> SELECT COUNT(*)         SELECT (COUNT(*) AS ?count)
>  FROM Conctacts         WHERE { ?who <Conctacts#company> "BobCo" }
> WHERE company="BobCo"
> 
> , I'm inclined to agree. Anyone disagree, or want to provide screw cases?

As a matter of fact, NULL values do not contribute to aggregations in SQL, so the above solution would work.

The point is not whether we agree or like it. The point is whether NULL values lead to the same behaviour as in the original RDB. Otherwise we are not mapping a RDB as an RDF graph faithfully.

What I am asking you since ages is to go through my three examples and see how your proposal would actually encode the answers, and show how this would lead to a generic recipe. My argument is that this will most likely be possible, but that it will be overly complex since it will necessarily require the ability to recognise whether a missing value is a NULL or not (also in the answer set!). Clearly, by having explicit NULL values this problem is avoided. Moreover, you can easily switch the the absent-NULL representation by just filtering all the tuples with NULL values in one simple shot.

cheers
--e.

Received on Monday, 13 June 2011 19:41:55 UTC