Re: Minutes of the 2010-01-18 telcon for the RDF2RDF WG from Alexandre Bertails on 2011-01-19 (public-rdb2rdf-wg@w3.org from January 2011)

From: Alexandre Bertails <bertails@w3.org>
Date: Wed, 19 Jan 2011 10:42:54 -0500
To: Juan Sequeda <juanfederico@gmail.com>
Cc: RDB2RDF WG <public-rdb2rdf-wg@w3.org>
Message-ID: <1295451774.21454.66.camel@simplet>
On Wed, 2011-01-19 at 09:18 -0600, Juan Sequeda wrote:
> Alex and all
> 
> 
> I see your point. And actually I found a bug, when it comes to
> generate a blank node. 
> 
> 
> For this example, this falls into the section 4.2.3 Table does not
> have a primary key. The rule is
> 
> 
> 
> Triple(s, p, yj) ← r(y1, ..., yn), generateRowBlankNode("r", [y1, ..., yn], s), generateColumnIRI("r", ["bj"], p)
> 
> So the instances are:
> 
> 
> debt(Juan, 50)
> debt(Juan, 50)
> 
> 
> The bug I see is the way generateRowBlankNode works. In this case, it
> takes as input all the values in the row. In this case, different rows
> have the same value, therefore the same blank node would be generated
> for different rows... which is wrong.

Which is actually right!

Your predicate works as a function. Therefore, firing it twice should
always produce the answer.

If you allow this behaviour, that means your underlying model for
generateRowBlankNode becomes multisets, which is not allowed by the
Datalog semantics.

>  We have not defined the way to automatically generate a Blank Node.
> What we have here was our proposal... which is wrong so thanks for
> helping us find it. So lets assume the input to generateBlankNode is
> going to be the row id (assuming that each row id is unique regardless
> if all the values are the same in different rows). Having that said
> then....
> 
> 
> the applied rule would be 
> 
> 
> Triple(s, p, name) <-- debt(name,_), generateBlankNode(debt, rowid,
> s), generateColumIRI(debt, name, p)
> 
> Triple(s, p, amount) <-- debt(_,amount), generateBlankNode(debt,
> rowid, s), generateColumIRI(debt, amount, p)
> 
> 
> Note that debt(name, _) is equivalent to SELECT name FROM debt. When
> applied to your example the answer is
> 
> 
> +-------------+
> | name      |
> +-------------+
> |       Juan |
> +-------------+
> |       Juan |
> +-------------+
> 
> 
> Furthermore,  debt(_, account) is equivalent to SELECT account FROM
> debt. When applied to your example the answer is
> 
> 
> +-------------+
> | amount    |
> +-------------+
> |       50    |
> +-------------+
> |       50    |
> +-------------+
> 
> 
> and we assume that each row comes with its unique row id. Therefore
> the output would be
> 
> 
> Triple(_:b1, foo:bar#name, Juan)
> Triple(_:b1, foo:bar#amount, 50)
> Triple(_:b2, foo:bar#name, Juan)
> Triple(_:b2, foo:bar#amount, 50)

I don't see how you can "remember" which row id was generated as you
have two different rules that will produce the triples?

There is some magic there.

> 
> 
> I want to clarify that the rules do not necessarily need to be
> implemented in a rule language (you can but you don't have to). The
> english semantics, for example of 4.2.3 is
> 
> 
> IF table does not have a primary key
> THEN 
>      IF 
>        given a relation  r(y1, ... , yn)  AND
>         generating the blank node for the row generateBlankNode(r,
> rowid, s) AND
>         generating a predicate IRI for the attribute
> generateColumnIRI(r, bj, p)
>       THEN
>          A triple is formed where the subject is the blank node, the
> predicate is the predicate IRI and the object is the value 

Here, you're allowed to "remember" the row id as you have some kind of
variable binding, which you can't do with Datalog. You can continue this
way (I mean, using a pseudo-algorithm). Hint: here is the next challenge
you'll have to solve: what do you do when the object refers to another
row id, coming from a SQL constraint?

Alexandre.

> 
> 
> Let me know what you think
> 
> 
> 
> 
> Juan Sequeda
> +1-575-SEQ-UEDA
> www.juansequeda.com
> 
> 
> On Tue, Jan 18, 2011 at 2:25 PM, Alexandre Bertails <bertails@w3.org>
> wrote:
>         On Tue, 2011-01-18 at 12:56 -0600, Juan Sequeda wrote:
>         > what do you mean by "go through all the phases"?
>         
>         
>         In your case, I would imagine something like:
>         1. RDB as CREATE + INSERT statements
>         2. Datalog rules + facts
>         3. triples as the result of firing the Datalog rules
>         4. serialized RDF
>         
>         Alexandre.
>         
>         
>         
>         >
>         > Juan Sequeda
>         > +1-575-SEQ-UEDA
>         > www.juansequeda.com
>         >
>         >
>         > On Tue, Jan 18, 2011 at 12:05 PM, Alexandre Bertails
>         <bertails@w3.org>
>         > wrote:
>         >
>         >         >
>         >         >    mhausenblas: questions for all the editors
>         >         >    ... I quite often go through the ML
>         >         >
>         >         >    <mhausenblas>
>         >         >
>         >         >
>         >
>         [19]http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2010Nov/0086.
>         >         >    html
>         >         >
>         >         >    mhausenblas: here is a link to a discussion
>         between
>         >         ericP, betehess,
>         >         >    you...
>         >         >    ... regarding set vs multiset
>         >         >    ... I'm not that deep in what in means
>         >         >    ... but I'd like to understand if this needs an
>         issue for
>         >         that
>         >         >
>         >         >    juansequeda: not sure if that need a issue
>         >         >    ... I'll look at that that
>         >         >
>         >         >    <mhausenblas>
>         >         [20]http://www.w3.org/2001/sw/rdb2rdf/track/issues/
>         >         >
>         >         >    juansequeda: ok to check an action on this
>         issue
>         >         >
>         >         >    mhausenblas: ok to split in many issues
>         >         >    ... and one on the multiset issue
>         >         >    ... so we can speak about that next week
>         >         >
>         >         >    betehess: please use the ML so we can help
>         you :-)
>         >         >    ... especially because ericP and I initiated
>         the thread
>         >
>         >
>         >         Juan, in order to help you with your action, I would
>         like you
>         >         to
>         >         considerer the following:
>         >
>         >         [[
>         >         CREATE TABLE Debts (
>         >         Name varchar(50),
>         >         Amount Integer
>         >         );
>         >         INSERT INTO Debts (Name, Amount) VALUES("juan", 50);
>         >         INSERT INTO Debts (Name, Amount) VALUES("juan", 50);
>         >         ]]
>         >
>         >         Using this very simple RDB [1] example, can you go
>         through all
>         >         the
>         >         phases that lead to the RDF where I owe you 100?
>         >
>         >         Alexandre.
>         >
>         >         [1]
>         >
>         http://www.w3.org/TR/2010/WD-rdb-direct-mapping-20101118/#Rel
>         >
>         >
>         >
>         
>         
>         
> 
>
Received on Wednesday, 19 January 2011 15:42:51 UTC