Re: Agenda for June 14 Telcon - Revision 1 from Eric Prud'hommeaux on 2011-06-13 (public-rdb2rdf-wg@w3.org from June 2011)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Mon, 13 Jun 2011 17:16:24 -0400
To: Enrico Franconi <franconi@inf.unibz.it>
Cc: ashok.malhotra@oracle.com, public-rdb2rdf-wg@w3.org
Message-ID: <20110613211623.GE22554@w3.org>
* Enrico Franconi <franconi@inf.unibz.it> [2011-06-13 21:41+0200]
> 
> On 13 Jun 2011, at 20:19, Eric Prud'hommeaux wrote:
> 
> > Examining use case like:
> > 
> > ┌┤Conctacts├─────┐      Direct Graph:
> > │ name │ company │      <Conctacts/name=Bob> <Conctacts#name> "Bob" ;
> > ├──────┼─────────┤                           <Conctacts#company> "BobCo" .
> > │  Bob │   BobCo │      <Conctacts/name=Sue> <Conctacts#name> "Sue" .
> > │  Sue │    NULL │
> > └──────┴─────────┘
> > 
> > What companies does Sue represent:
> > SELECT company          SELECT ?company        
> >  FROM Conctacts         WHERE { ?sue <Conctacts#name> "Sue" ;     
> > WHERE name="Sue"                     <Conctacts#company> ?company }
> 
> Did people read what I wrote in the Wiki?

Yep, I just wanted to re-characterize them with an intuitive use case. Tx for the ground work.

> That's my example (a). In my example (a) in the wiki I show that the above solution is not desirable, since (1) it does not distinguish the answers between your Contacts table above and a similar Contacts1 table where the second tuple is absent, and (2) it is not compliant with SQL semantics, which returns (as I show in the wiki) in the case of Contacts two bindings for ?company, namely "BobCo" and NULL, while in the case of Contacts1 it returns the first binding only.

There is a fundamental difference between SPARQL and SQL users in that SQL users either prohibit a query from answering with NULLs:
  SELECT name, company           ┌────────────────┐
    FROM Conctacts    │ name │ company │
   WHERE name="Sue"   ├──────┼─────────┤
     AND company IS NOT NULL  └──────┴─────────┘
or they write in some application code to skip over the NULLs, or, pretty commonly, the UI paints an empty string and the interface user has to guess whether it's was a NULL or a company named "". The intent of the query in this example was clearly to get the names of the companies which Sue represents, for wich neither NULL nor r2rml:NULL nor "" are acceptable answers.

At any rate, I was just arguing that given a tension between putting burden on the query author to incorporate <code>FILTER (?company != r2rml:NULL)</code> into the above query, vs. requiring the person who wants to see the NULL to know the schema:
                                                      ┌────────────────┐
  SELECT *                                            │  who │ company │
   WHERE { ?who <Conctacts#name> "Sue"        ├──────┼─────────┤
   OPTIONAL { ?who <Conctacts#company> ?company } }   │  Sue │ UNBOUND │
            └──────┴─────────┘
, I *think* the rest of the WG is in favor of the the latter (hence the claim of rough concensus).

> That's why I am saying "This mapping for NULL values is arbitrary since the WG has left unexplored its relationship with the original meaning and behaviour of NULL values in the source RDB."
> 
> > How many people represent BobCo? (we don't know if Sue does)
> > SELECT COUNT(*)         SELECT (COUNT(*) AS ?count)
> >  FROM Conctacts         WHERE { ?who <Conctacts#company> "BobCo" }
> > WHERE company="BobCo"
> > 
> > , I'm inclined to agree. Anyone disagree, or want to provide screw cases?
> 
> As a matter of fact, NULL values do not contribute to aggregations in SQL, so the above solution would work.
> 
> The point is not whether we agree or like it. The point is whether NULL values lead to the same behaviour as in the original RDB. Otherwise we are not mapping a RDB as an RDF graph faithfully.
> 
> What I am asking you since ages is to go through my three examples and see how your proposal would actually encode the answers, and show how this would lead to a generic recipe. My argument is that this will most likely be possible, but that it will be overly complex since it will necessarily require the ability to recognise whether a missing value is a NULL or not (also in the answer set!). Clearly, by having explicit NULL values this problem is avoided. Moreover, you can easily switch the the absent-NULL representation by just filtering all the tuples with NULL values in one simple shot.

In <http://www.w3.org/2001/sw/rdb2rdf/wiki/RDBNullValues#Comments_and_Proposal_by_Enrico>, you asked how to discriminate between the direct graphs of
  ┌┤R├────────┐ and ┌┤R'├┐
  │ ID │    A │     │ ID │
  ├────┼──────┤     ├────┤
  │  1 │ NULL │     │  1 │
  └────┴──────┘     └────┘
, but we do that by knowing the schema so the question doesn't help us learn what is a reasonable mapping. I instead propose that you ask questions of the ┤Conctacts├ database above and show how, even knowing the schema, the direct graph doesn't give you reallistic access to information. Remember, this isn't a database interchance language, but instead a way to give RDF users an useful view of relational data.


> cheers
> --e.

-- 
-ericP
Received on Monday, 13 June 2011 21:17:04 UTC