Re: Agenda for June 14 Telcon - Revision 1 from Eric Prud'hommeaux on 2011-06-15 (public-rdb2rdf-wg@w3.org from June 2011)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Tue, 14 Jun 2011 23:47:31 -0400
To: Enrico Franconi <franconi@inf.unibz.it>
Cc: RDB2RDF WG <public-rdb2rdf-wg@w3.org>
Message-ID: <20110615034729.GA8669@w3.org>
* Enrico Franconi <franconi@inf.unibz.it> [2011-06-14 22:21+0200]
> On 14 Jun 2011, at 21:45, Eric Prud'hommeaux wrote:
> 
> > * Enrico Franconi <franconi@inf.unibz.it> [2011-06-14 17:35+0200]
> >> As I said, if you are a good db designer, you would design the schema of your db in a way that no attribute is nullable if you want to represent just total absence of values. How? By decomposing the potentially nullable attributes as separate (pseudo binary) relations (primary key of the relation + the attribute), and by adding a foreign key. The attributes in this relation will never have NULL values, since the absence of a value would be represented as the absence of the tuple. This is exactly your proposed DM where the null values just mean absence of information.
> > 
> > it sounds like you have an example in mind. if you share it with us, we can use it to make informed modelling decisions.
> > 
> >> On the other hand, in SQL I can also write a relationship with some nullable attributes. In this case I mean something different, namely the ambiguity between the total absence of a value and its presence but with an unknown specification.
> >> Queries over nullable attributes may have the NULL value in the answer; its presence may affect further queries, such as in the query (c) in the wiki.
> 
> From your example:
> 
> 
> ┌┤Contacts├──────┐
> │ name │ company │
> ├──────┼─────────┤
> │  Bob │   BobCo │
> │  Sue │    NULL │
> └──────┴─────────┘
> [Contacts.name is primary key]
> [Contacts.company is nullable]

DG:
<Contacts/name=Bob> <Contacts#name> "Bob" ; <Contacts#company> "BobCo" .
<Contacts/name=Sue> <Contacts#name> "Sue" .


> If you really mean that Sue does NOT have any company associated to it, the proper modelling would be:
> 
> ┌┤Person│
> │ name │
> ├──────┤
> │  Bob │
> │  Sue │
> └──────┘
> [Person.name is primary key]
> 
> ┌┤hasContact├────┐
> │ name │ company │
> ├──────┼─────────┤
> │  Bob │   BobCo │
> └──────┴─────────┘
> [{hasContact.name,hasContact.company} is primary key]
> [foreign key from Person.name to hasContact.name]
> [hasContact.company not-nullable]

DG':
<Person/name=Bob> <Person#name> "Bob" .
<Person/name=Sue> <Person#name> "Sue" .
<hasContact/name=Bob,company=BobCo> <hasContact#name> <Person/name=Bob> ;
                                    <hasContact#company> "BobCo" .


> Note that the latter database is exactly the same encoding as the one produced by the DM from the former database (the foreign key being encoded with the rdf:type), namely the one where the intent of the designer is to capture the ABSENCE of a value.
> 
> Indeed, if I ask for all the companies, I'd get from the first database:
> Q(x) :- Contacts(y,x)

Should that be Q(y,x), given the result {BobCo, NULL} below?
Presuming so, and converting to SQL:
SELECT name, company FROM Contacts

> --> {BobCo, NULL}
> while I'd get from the second database:
> Q'(x) :- Person(y),hasContact(y,x)

SQL: SELECT Person.name FROM Person INNER JOIN hasContacts ON hasContact.name=Person.name

> --> {BobCo}
> 
> So, I get definitely two different answers. Note again that a direct use of the DM (without the schema) consistently gives you the second answer.
> 
> Now, Juan claims that with the schema we can reconstruct the answer of the first kind from representations of the second kind. I believe that it is possible - after all, we do not have information loss if we have the schema. The problem could be that relational algebra (e.g., SPARQL) may not be expressive enough to reconstruct systematically the right answers from the representations of the second kind for all relational algebra queries. That's way I did not guarantee a positive outcome from this investigation.

Above, you've outlined two ways to encode application data, requiring the data to be accessed by different SQL (or relational algebra, or tuple calculus) queries. Likewise, the DG is different for the two databases, as are the SPARQL queries required to access this data:

DG: SELECT ?n ?c { ?who <Contacts#name> "Bob" 
        OPTIONAL { ?who <Contacts#company> ?c } }

DG': SELECT ?n { ?contact <hasContact#name> ?who ;
                          <hasContact#company> "BobCo" .
                 ?who <Person#name> ?n }

It is beyond our goals (nor is it achievable) to produce a direct graph which is constant through all potential encodings of conceptually similar application data. I say that our needs are met by the current DM with no modification.


> cheers
> --e.
> 
> 
> 

-- 
-ericP
Received on Wednesday, 15 June 2011 03:48:01 UTC