Re: Agenda for June 14 Telcon - Revision 1 from Eric Prud'hommeaux on 2011-06-14 (public-rdb2rdf-wg@w3.org from June 2011)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Tue, 14 Jun 2011 09:19:05 -0400
To: Enrico Franconi <franconi@inf.unibz.it>
Cc: ashok.malhotra@oracle.com, public-rdb2rdf-wg@w3.org
Message-ID: <20110614131904.GB27798@w3.org>
* Enrico Franconi <franconi@inf.unibz.it> [2011-06-14 12:44+0200]
> On 13 Jun 2011, at 23:16, Eric Prud'hommeaux wrote:
> 
> > There is a fundamental difference between SPARQL and SQL users in that SQL users either prohibit a query from answering with NULLs:
> >   SELECT name, company           ┌────────────────┐
> >    FROM Conctacts                │ name │ company │
> >   WHERE name="Sue"               ├──────┼─────────┤
> >     AND company IS NOT NULL      └──────┴─────────┘
> > or they write in some application code to skip over the NULLs, or, pretty commonly, the UI paints an empty string and the interface user has to guess whether it's was a NULL or a company named "". The intent of the query in this example was clearly to get the names of the companies which Sue represents, for wich neither NULL nor r2rml:NULL nor "" are acceptable answers.
> 
> I claim that you can filter out NULLs, exactly like you would do in SQL. On which ground do you claim that applications built on top of RDF data are different from applications built on top a RDB wrt the usage of NULLs? I don't see any evidence that there is such a radical difference to justify your non-standard way in dealing with standard NULLs.

Not making assertions *is* the standard RDF way of dealing with missing information. There is no rdf:NULL. The closest term is rdf:nil, which is used *only* to indicate the end of a list, <http://www.w3.org/TR/rdf-schema/#ch_nil>, and even then, it's not used car position, but in a cdr position.


> > At any rate, I was just arguing that given a tension between putting burden on the query author to incorporate <code>FILTER (?company != r2rml:NULL)</code> into the above query, vs. requiring the person who wants to see the NULL to know the schema:
> >                                                      ┌────────────────┐
> >  SELECT *                                            │  who │ company │
> >   WHERE { ?who <Conctacts#name> "Sue"                ├──────┼─────────┤
> >   OPTIONAL { ?who <Conctacts#company> ?company } }   │  Sue │ UNBOUND │
> >                                                      └──────┴─────────┘
> > , I *think* the rest of the WG is in favor of the the latter (hence the claim of rough concensus).
> 
> No, this doesn't work, since you would confuse the answer with a NULL value with the answer with a non existing value. So, the above query doesn't do the job you are declaring. It is ages I'm asking to this WG how to rebuild the correct answers with explicit NULLs from your representation (even with the schema). To no avail. 

We don't know or care about the difference between NULL and a non-existent value (nor does SQL, for that matter).
We don't write NULLs in RDF so asking to correct the graph according to your desire to see NULLs is likely to be to no avail.

> So, please tell me explicitly how do you get the right answer in the above case, with all the details (how the schema is used, how do you distinguish the missing value with the NULL value, how this can be applied mechanically to general queries, etc).

I knew the schema, in particular, that ?who might have a company. I wrote a query sensitive to the fact that it might not have a company. I learned that Sue is not assigned to a company, just as I would have were I to issue the SQL query SELECT company FROM Contacts where name="Sue";


> >> That's why I am saying "This mapping for NULL values is arbitrary since the WG has left unexplored its relationship with the original meaning and behaviour of NULL values in the source RDB."
> 
> I can repeat that :-)

Not arbitrary, sensitive to the design tenets of RDF.


> >> What I am asking you since ages is to go through my three examples and see how your proposal would actually encode the answers, and show how this would lead to a generic recipe.
> 
> This request still stands.
> 
> >> My argument is that this will most likely be possible, but that it will be overly complex since it will necessarily require the ability to recognise whether a missing value is a NULL or not (also in the answer set!).
> 
> Let's see your answer to my question in bold above.
> 
> >> Clearly, by having explicit NULL values this problem is avoided. Moreover, you can easily switch the the absent-NULL representation by just filtering all the tuples with NULL values in one simple shot.
> > 
> > In <http://www.w3.org/2001/sw/rdb2rdf/wiki/RDBNullValues#Comments_and_Proposal_by_Enrico>, you asked how to discriminate between the direct graphs of
> >  ┌┤R├────────┐ and ┌┤R'├┐
> >  │ ID │    A │     │ ID │
> >  ├────┼──────┤     ├────┤
> >  │  1 │ NULL │     │  1 │
> >  └────┴──────┘     └────┘
> > , but we do that by knowing the schema so the question doesn't help us learn what is a reasonable mapping.
> 
> This is too vague: "we do that by knowing the schema". As I said above, please tell how do you proceed explicitly.

Given a relation R with a attributes, compose a query:
  "SELECT " + ("?" + a₁) … + ("?" + aₐ)
 + "WHERE { _:s a <" + R + ">"
 + "          ; <" + R + "#" + a₁ + "> ?" + a₁
 + "          ; <" + R + "#" + aₐ + "> ?" + aₐ
 + "      }"

Conctacts Example:
  Table:              SPARQL Query:                           Result:
┌┤Contacts├──────┐  SELECT ?name ?company                    ┌────────────────┐
│ name │ company │  WHERE { _:s a <Contacts>                 │  who │ company │
├──────┼─────────┤            ; <Contacts#name> ?name        ├──────┼─────────┤
│  Bob │   BobCo │            ; <Contacts#company ?company   │  Bob │   BobCo │
│  Sue │    NULL │        }                                  │  Sue │ UNBOUND │
└──────┴─────────┘                                           └──────┴─────────┘

No information is lost. NULL is no longer spelled "NULL", but instead as a lack of a binding. Applications using SPARQL interpret this lack of a binding the same way applications using SQL interpret NULL.

> >  I instead propose that you ask questions of the ┤Conctacts├ database above and show how, even knowing the schema, the direct graph doesn't give you reallistic access to information. Remember, this isn't a database interchance language, but instead a way to give RDF users an useful view of relational data.
> 
> I don't understand this point :-(

I find the Contacts table more intuitive in examples than is the R table you proposed. Given that table, or one of your choosing, can you show how the current plan spec's lack of NULL prevents users from executing realistic queries?

> cheers
> --e.
> 

-- 
-ericP
Received on Tuesday, 14 June 2011 13:19:40 UTC