Re: Agenda for June 14 Telcon - Revision 1 from Michael Hausenblas on 2011-06-14 (public-rdb2rdf-wg@w3.org from June 2011)

From: Michael Hausenblas <michael.hausenblas@deri.org>
Date: Tue, 14 Jun 2011 12:37:31 +0100
To: Enrico Franconi <franconi@inf.unibz.it>
Cc: W3C RDB2RDF <public-rdb2rdf-wg@w3.org>
Message-Id: <31FE7DD4-EC26-499B-8A58-3D9AA066CBB1@deri.org>
> Fair enough. If you believe so, then the proposal should be the one  
> where we give up on NULL values, since it is the only one where  
> there is no technical disagreement in the WG :-)

OK. So here is the proposal:

[[
PROPOSAL: To resolve ISSUE-42, the Direct Mapping will include triples  
representing the relational schema and will omit triples for NULL  
values.
]]


Cheers,
 Michael
--
Dr. Michael Hausenblas, Research Fellow
LiDRC - Linked Data Research Centre
DERI - Digital Enterprise Research Institute
NUIG - National University of Ireland, Galway
Ireland, Europe
Tel. +353 91 495730
http://linkeddata.deri.ie/
http://sw-app.org/about.html

On 14 Jun 2011, at 12:24, Enrico Franconi wrote:

> On 14 Jun 2011, at 13:17, Michael Hausenblas <michael.hausenblas@deri.org 
> > wrote:
>
>>
>>> In the wiki I came up explicitly with 3 alternative concrete  
>>> wordings; please look at them.
>>
>>
>> Looked at them. I need one (1) not three (3).
>>
>>
>>> What I can not do is to solve the open technical problem for the  
>>> representation with missing NULLs, since it is hard and complex.
>>
>> That's also my understanding. Hence we can't normatively spec  
>> something where even the scientific part is not solved.
>
> Fair enough. If you believe so, then the proposal should be the one  
> where we give up on NULL values, since it is the only one where  
> there is no technical disagreement in the WG :-)
> I argued that also the proposal with materialised NULLs is  
> technically sound, but not everybody in the WG believes so.
> --e.
>
>
>>
>> Cheers,
>>   Michael
>> --
>> Dr. Michael Hausenblas, Research Fellow
>> LiDRC - Linked Data Research Centre
>> DERI - Digital Enterprise Research Institute
>> NUIG - National University of Ireland, Galway
>> Ireland, Europe
>> Tel. +353 91 495730
>> http://linkeddata.deri.ie/
>> http://sw-app.org/about.html
>>
>> On 14 Jun 2011, at 12:15, Enrico Franconi wrote:
>>
>>> In the wiki I came up explicitly with 3 alternative concrete  
>>> wordings; please look at them.
>>> What I can not do is to solve the open technical problem for the  
>>> representation with missing NULLs, since it is hard and complex.  
>>> The proposers of this representation should come up with an answer  
>>> to this question, so to support their argument. Otherwise only my  
>>> proposals can stand.
>>>
>>> On 14 Jun 2011, at 13:07, Michael Hausenblas <michael.hausenblas@deri.org 
>>> > wrote:
>>>
>>>>
>>>>> It is ages I'm asking to this WG how to rebuild the correct  
>>>>> answers with explicit NULLs from your representation
>>>>
>>>> This is, IMO, the core of the problem. You're asking rather than  
>>>> coming up with a concrete wording for the proposal.
>>>>
>>>> Please, for the sake of getting this issue closed and meeting the  
>>>> September deadline for LC: Enrico, can you draft a concrete  
>>>> wording such as:
>>>>
>>>>
>>>> [[
>>>> PROPOSAL: To resolve ISSUE-42, ...
>>>> ]]
>>>>
>>>>
>>>> that we can discuss and hopefully resolve today?
>>>>
>>>> If we fail to get this done today I'm inclined to change the  
>>>> overall timeline because we have a lot of more issues to resolve  
>>>> and simply can not afford it to discuss one single issue (no  
>>>> matter how important it is) till the cows come home.
>>>>
>>>> This is not a scientific beauty context. We're writing a spec,  
>>>> for heavens sake.
>>>>
>>>> Cheers,
>>>> Michael
>>>> --
>>>> Dr. Michael Hausenblas, Research Fellow
>>>> LiDRC - Linked Data Research Centre
>>>> DERI - Digital Enterprise Research Institute
>>>> NUIG - National University of Ireland, Galway
>>>> Ireland, Europe
>>>> Tel. +353 91 495730
>>>> http://linkeddata.deri.ie/
>>>> http://sw-app.org/about.html
>>>>
>>>> On 14 Jun 2011, at 11:44, Enrico Franconi wrote:
>>>>
>>>>> On 13 Jun 2011, at 23:16, Eric Prud'hommeaux wrote:
>>>>>
>>>>>> There is a fundamental difference between SPARQL and SQL users  
>>>>>> in that SQL users either prohibit a query from answering with  
>>>>>> NULLs:
>>>>>> SELECT name, company            
>>>>>> ┌────────────────┐
>>>>>> FROM Conctacts         │ name │ company │
>>>>>> WHERE name="Sue"          
>>>>>> ├──────┼─────────┤
>>>>>> AND company IS NOT NULL      
>>>>>> └──────┴─────────┘
>>>>>> or they write in some application code to skip over the NULLs,  
>>>>>> or, pretty commonly, the UI paints an empty string and the  
>>>>>> interface user has to guess whether it's was a NULL or a  
>>>>>> company named "". The intent of the query in this example was  
>>>>>> clearly to get the names of the companies which Sue represents,  
>>>>>> for wich neither NULL nor r2rml:NULL nor "" are acceptable  
>>>>>> answers.
>>>>>
>>>>> I claim that you can filter out NULLs, exactly like you would do  
>>>>> in SQL. On which ground do you claim that applications built on  
>>>>> top of RDF data are different from applications built on top a  
>>>>> RDB wrt the usage of NULLs? I don't see any evidence that there  
>>>>> is such a radical difference to justify your non-standard way in  
>>>>> dealing with standard NULLs.
>>>>>
>>>>>> At any rate, I was just arguing that given a tension between  
>>>>>> putting burden on the query author to incorporate <code>FILTER  
>>>>>> (?company != r2rml:NULL)</code> into the above query, vs.  
>>>>>> requiring the person who wants to see the NULL to know the  
>>>>>> schema:
>>>>>>                                                   
>>>>>> ┌────────────────┐
>>>>>> SELECT *                                            │  who │  
>>>>>> company │
>>>>>> WHERE { ?who <Conctacts#name> "Sue"               
>>>>>> ├──────┼─────────┤
>>>>>> OPTIONAL { ?who <Conctacts#company> ?company } }   │  Sue │  
>>>>>> UNBOUND │
>>>>>>                            
>>>>>> └──────┴─────────┘
>>>>>> , I *think* the rest of the WG is in favor of the the latter  
>>>>>> (hence the claim of rough concensus).
>>>>>
>>>>> No, this doesn't work, since you would confuse the answer with a  
>>>>> NULL value with the answer with a non existing value. So, the  
>>>>> above query doesn't do the job you are declaring. It is ages I'm  
>>>>> asking to this WG how to rebuild the correct answers with  
>>>>> explicit NULLs from your representation (even with the schema).  
>>>>> To no avail.
>>>>> So, please tell me explicitly how do you get the right answer in  
>>>>> the above case, with all the details (how the schema is used,  
>>>>> how do you distinguish the missing value with the NULL value,  
>>>>> how this can be applied mechanically to general queries, etc).
>>>>>
>>>>>>> That's why I am saying "This mapping for NULL values is  
>>>>>>> arbitrary since the WG has left unexplored its relationship  
>>>>>>> with the original meaning and behaviour of NULL values in the  
>>>>>>> source RDB."
>>>>>
>>>>> I can repeat that :-)
>>>>>
>>>>>>> What I am asking you since ages is to go through my three  
>>>>>>> examples and see how your proposal would actually encode the  
>>>>>>> answers, and show how this would lead to a generic recipe.
>>>>>
>>>>> This request still stands.
>>>>>
>>>>>>> My argument is that this will most likely be possible, but  
>>>>>>> that it will be overly complex since it will necessarily  
>>>>>>> require the ability to recognise whether a missing value is a  
>>>>>>> NULL or not (also in the answer set!).
>>>>>
>>>>> Let's see your answer to my question in bold above.
>>>>>
>>>>>>> Clearly, by having explicit NULL values this problem is  
>>>>>>> avoided. Moreover, you can easily switch the the absent-NULL  
>>>>>>> representation by just filtering all the tuples with NULL  
>>>>>>> values in one simple shot.
>>>>>>
>>>>>> In <http://www.w3.org/2001/sw/rdb2rdf/wiki/RDBNullValues#Comments_and_Proposal_by_Enrico 
>>>>>> >, you asked how to discriminate between the direct graphs of
>>>>>> ┌┤R├────────┐ and ┌┤R'├┐
>>>>>> │ ID │    A │     │ ID │
>>>>>> ├────┼──────┤     ├────┤
>>>>>> │  1 │ NULL │     │  1 │
>>>>>> └────┴──────┘     └────┘
>>>>>> , but we do that by knowing the schema so the question doesn't  
>>>>>> help us learn what is a reasonable mapping.
>>>>>
>>>>> This is too vague: "we do that by knowing the schema". As I said  
>>>>> above, please tell how do you proceed explicitly.
>>>>>
>>>>>> I instead propose that you ask questions of the ┤Conctacts├  
>>>>>> database above and show how, even knowing the schema, the  
>>>>>> direct graph doesn't give you reallistic access to information.  
>>>>>> Remember, this isn't a database interchance language, but  
>>>>>> instead a way to give RDF users an useful view of relational  
>>>>>> data.
>>>>>
>>>>> I don't understand this point :-(
>>>>>
>>>>> cheers
>>>>> --e.
>>>>>
>>>>
>>
Received on Tuesday, 14 June 2011 11:38:05 UTC