Re: Agenda for June 14 Telcon - Revision 1 from Michael Hausenblas on 2011-06-14 (public-rdb2rdf-wg@w3.org from June 2011)

From: Michael Hausenblas <michael.hausenblas@deri.org>
Date: Tue, 14 Jun 2011 12:17:01 +0100
To: Enrico Franconi <franconi@inf.unibz.it>
Cc: Eric Prud'hommeaux <eric@w3.org>, "ashok.malhotra@oracle.com" <ashok.malhotra@oracle.com>, "public-rdb2rdf-wg@w3.org" <public-rdb2rdf-wg@w3.org>
Message-Id: <9A86882A-D29F-4467-B0A8-CA46096206A5@deri.org>
> In the wiki I came up explicitly with 3 alternative concrete  
> wordings; please look at them.


Looked at them. I need one (1) not three (3).


> What I can not do is to solve the open technical problem for the  
> representation with missing NULLs, since it is hard and complex.

That's also my understanding. Hence we can't normatively spec  
something where even the scientific part is not solved.

Cheers,
 Michael
--
Dr. Michael Hausenblas, Research Fellow
LiDRC - Linked Data Research Centre
DERI - Digital Enterprise Research Institute
NUIG - National University of Ireland, Galway
Ireland, Europe
Tel. +353 91 495730
http://linkeddata.deri.ie/
http://sw-app.org/about.html

On 14 Jun 2011, at 12:15, Enrico Franconi wrote:

> In the wiki I came up explicitly with 3 alternative concrete  
> wordings; please look at them.
> What I can not do is to solve the open technical problem for the  
> representation with missing NULLs, since it is hard and complex. The  
> proposers of this representation should come up with an answer to  
> this question, so to support their argument. Otherwise only my  
> proposals can stand.
>
> On 14 Jun 2011, at 13:07, Michael Hausenblas <michael.hausenblas@deri.org 
> > wrote:
>
>>
>>> It is ages I'm asking to this WG how to rebuild the correct  
>>> answers with explicit NULLs from your representation
>>
>> This is, IMO, the core of the problem. You're asking rather than  
>> coming up with a concrete wording for the proposal.
>>
>> Please, for the sake of getting this issue closed and meeting the  
>> September deadline for LC: Enrico, can you draft a concrete wording  
>> such as:
>>
>>
>> [[
>> PROPOSAL: To resolve ISSUE-42, ...
>> ]]
>>
>>
>> that we can discuss and hopefully resolve today?
>>
>> If we fail to get this done today I'm inclined to change the  
>> overall timeline because we have a lot of more issues to resolve  
>> and simply can not afford it to discuss one single issue (no matter  
>> how important it is) till the cows come home.
>>
>> This is not a scientific beauty context. We're writing a spec, for  
>> heavens sake.
>>
>> Cheers,
>>   Michael
>> --
>> Dr. Michael Hausenblas, Research Fellow
>> LiDRC - Linked Data Research Centre
>> DERI - Digital Enterprise Research Institute
>> NUIG - National University of Ireland, Galway
>> Ireland, Europe
>> Tel. +353 91 495730
>> http://linkeddata.deri.ie/
>> http://sw-app.org/about.html
>>
>> On 14 Jun 2011, at 11:44, Enrico Franconi wrote:
>>
>>> On 13 Jun 2011, at 23:16, Eric Prud'hommeaux wrote:
>>>
>>>> There is a fundamental difference between SPARQL and SQL users in  
>>>> that SQL users either prohibit a query from answering with NULLs:
>>>> SELECT name, company            
>>>> ┌────────────────┐
>>>>  FROM Conctacts         │ name │ company │
>>>> WHERE name="Sue"          
>>>> ├──────┼─────────┤
>>>>   AND company IS NOT NULL      
>>>> └──────┴─────────┘
>>>> or they write in some application code to skip over the NULLs,  
>>>> or, pretty commonly, the UI paints an empty string and the  
>>>> interface user has to guess whether it's was a NULL or a company  
>>>> named "". The intent of the query in this example was clearly to  
>>>> get the names of the companies which Sue represents, for wich  
>>>> neither NULL nor r2rml:NULL nor "" are acceptable answers.
>>>
>>> I claim that you can filter out NULLs, exactly like you would do  
>>> in SQL. On which ground do you claim that applications built on  
>>> top of RDF data are different from applications built on top a RDB  
>>> wrt the usage of NULLs? I don't see any evidence that there is  
>>> such a radical difference to justify your non-standard way in  
>>> dealing with standard NULLs.
>>>
>>>> At any rate, I was just arguing that given a tension between  
>>>> putting burden on the query author to incorporate <code>FILTER (? 
>>>> company != r2rml:NULL)</code> into the above query, vs. requiring  
>>>> the person who wants to see the NULL to know the schema:
>>>>                                                     
>>>> ┌────────────────┐
>>>> SELECT *                                            │  who │  
>>>> company │
>>>> WHERE { ?who <Conctacts#name> "Sue"               
>>>> ├──────┼─────────┤
>>>> OPTIONAL { ?who <Conctacts#company> ?company } }   │  Sue │  
>>>> UNBOUND │
>>>>                              
>>>> └──────┴─────────┘
>>>> , I *think* the rest of the WG is in favor of the the latter  
>>>> (hence the claim of rough concensus).
>>>
>>> No, this doesn't work, since you would confuse the answer with a  
>>> NULL value with the answer with a non existing value. So, the  
>>> above query doesn't do the job you are declaring. It is ages I'm  
>>> asking to this WG how to rebuild the correct answers with explicit  
>>> NULLs from your representation (even with the schema). To no avail.
>>> So, please tell me explicitly how do you get the right answer in  
>>> the above case, with all the details (how the schema is used, how  
>>> do you distinguish the missing value with the NULL value, how this  
>>> can be applied mechanically to general queries, etc).
>>>
>>>>> That's why I am saying "This mapping for NULL values is  
>>>>> arbitrary since the WG has left unexplored its relationship with  
>>>>> the original meaning and behaviour of NULL values in the source  
>>>>> RDB."
>>>
>>> I can repeat that :-)
>>>
>>>>> What I am asking you since ages is to go through my three  
>>>>> examples and see how your proposal would actually encode the  
>>>>> answers, and show how this would lead to a generic recipe.
>>>
>>> This request still stands.
>>>
>>>>> My argument is that this will most likely be possible, but that  
>>>>> it will be overly complex since it will necessarily require the  
>>>>> ability to recognise whether a missing value is a NULL or not  
>>>>> (also in the answer set!).
>>>
>>> Let's see your answer to my question in bold above.
>>>
>>>>> Clearly, by having explicit NULL values this problem is avoided.  
>>>>> Moreover, you can easily switch the the absent-NULL  
>>>>> representation by just filtering all the tuples with NULL values  
>>>>> in one simple shot.
>>>>
>>>> In <http://www.w3.org/2001/sw/rdb2rdf/wiki/RDBNullValues#Comments_and_Proposal_by_Enrico 
>>>> >, you asked how to discriminate between the direct graphs of
>>>> ┌┤R├────────┐ and ┌┤R'├┐
>>>> │ ID │    A │     │ ID │
>>>> ├────┼──────┤     ├────┤
>>>> │  1 │ NULL │     │  1 │
>>>> └────┴──────┘     └────┘
>>>> , but we do that by knowing the schema so the question doesn't  
>>>> help us learn what is a reasonable mapping.
>>>
>>> This is too vague: "we do that by knowing the schema". As I said  
>>> above, please tell how do you proceed explicitly.
>>>
>>>> I instead propose that you ask questions of the ┤Conctacts├  
>>>> database above and show how, even knowing the schema, the direct  
>>>> graph doesn't give you reallistic access to information.  
>>>> Remember, this isn't a database interchance language, but instead  
>>>> a way to give RDF users an useful view of relational data.
>>>
>>> I don't understand this point :-(
>>>
>>> cheers
>>> --e.
>>>
>>
Received on Tuesday, 14 June 2011 11:17:30 UTC