Re: Agenda for June 14 Telcon - Revision 1 from Enrico Franconi on 2011-06-14 (public-rdb2rdf-wg@w3.org from June 2011)

From: Enrico Franconi <franconi@inf.unibz.it>
Date: Tue, 14 Jun 2011 22:21:44 +0200
To: Eric Prud'hommeaux <eric@w3.org>
Cc: RDB2RDF WG <public-rdb2rdf-wg@w3.org>
Message-Id: <38F06EA6-6620-44D6-AFDF-CEAA98712623@inf.unibz.it>

On 14 Jun 2011, at 21:45, Eric Prud'hommeaux wrote:

> * Enrico Franconi <franconi@inf.unibz.it> [2011-06-14 17:35+0200]
>> As I said, if you are a good db designer, you would design the schema of your db in a way that no attribute is nullable if you want to represent just total absence of values. How? By decomposing the potentially nullable attributes as separate (pseudo binary) relations (primary key of the relation + the attribute), and by adding a foreign key. The attributes in this relation will never have NULL values, since the absence of a value would be represented as the absence of the tuple. This is exactly your proposed DM where the null values just mean absence of information.
> 
> it sounds like you have an example in mind. if you share it with us, we can use it to make informed modelling decisions.
> 
>> On the other hand, in SQL I can also write a relationship with some nullable attributes. In this case I mean something different, namely the ambiguity between the total absence of a value and its presence but with an unknown specification.
>> Queries over nullable attributes may have the NULL value in the answer; its presence may affect further queries, such as in the query (c) in the wiki.

From your example:

┌┤Contacts├──────┐
│ name │ company │
├──────┼─────────┤
│  Bob │   BobCo │
│  Sue │    NULL │
└──────┴─────────┘
[Contacts.name is primary key]
[Contacts.company is nullable]

If you really mean that Sue does NOT have any company associated to it, the proper modelling would be:

┌┤Person│
│ name │
├──────┤
│  Bob │
│  Sue │
└──────┘
[Person.name is primary key]

┌┤hasContact├────┐
│ name │ company │
├──────┼─────────┤
│  Bob │   BobCo │
└──────┴─────────┘
[{hasContact.name,hasContact.company} is primary key]
[foreign key from Person.name to hasContact.name]
[hasContact.company not-nullable]

Note that the latter database is exactly the same encoding as the one produced by the DM from the former database (the foreign key being encoded with the rdf:type), namely the one where the intent of the designer is to capture the ABSENCE of a value.

Indeed, if I ask for all the companies, I'd get from the first database:
Q(x) :- Contacts(y,x)
--> {BobCo, NULL}
while I'd get from the second database:
Q'(x) :- Person(y),hasContact(y,x)
--> {BobCo}

So, I get definitely two different answers. Note again that a direct use of the DM (without the schema) consistently gives you the second answer.

Now, Juan claims that with the schema we can reconstruct the answer of the first kind from representations of the second kind. I believe that it is possible - after all, we do not have information loss if we have the schema. The problem could be that relational algebra (e.g., SPARQL) may not be expressive enough to reconstruct systematically the right answers from the representations of the second kind for all relational algebra queries. That's way I did not guarantee a positive outcome from this investigation.

cheers
--e.

Received on Tuesday, 14 June 2011 20:22:26 UTC