Re: example SQL mapping to direct graph from Michael Hausenblas on 2011-01-19 (public-rdb2rdf-wg@w3.org from January 2011)

From: Michael Hausenblas <michael.hausenblas@deri.org>
Date: Wed, 19 Jan 2011 16:42:49 +0000
To: Juan Sequeda <juanfederico@gmail.com>, Alexandre Bertails <bertails@w3.org>, Eric Prud'hommeaux <eric@w3.org>
CC: RDB2RDF WG <public-rdb2rdf-wg@w3.org>
Message-ID: <C95CC509.19C29%michael.hausenblas@deri.org>
With my chair hat on: I appreciate all your enthusiasm very much, but I
would appreciate it even more if this discussion would be done within the
context of an issue.

Tracker, this is ACTION-95 on Juan.

Cheers,
      Michael

-- 
Dr. Michael Hausenblas, Research Fellow
LiDRC - Linked Data Research Centre
DERI - Digital Enterprise Research Institute
NUIG - National University of Ireland, Galway
Ireland, Europe
Tel. +353 91 495730
http://linkeddata.deri.ie/
http://sw-app.org/about.html



> From: Eric Prud'hommeaux <eric@w3.org>
> Organization: World Wide Web Consortium (W3C) - http://www.w3.org/
> Date: Wed, 19 Jan 2011 11:29:34 -0500
> To: Alexandre Bertails <bertails@w3.org>
> Cc: <bvillazon@fi.upm.es>, Harry Halpin <hhalpin@w3.org>, RDB2RDF WG
> <public-rdb2rdf-wg@w3.org>
> Subject: example SQL mapping to direct graph
> Resent-From: RDB2RDF WG <public-rdb2rdf-wg@w3.org>
> Resent-Date: Wed, 19 Jan 2011 16:31:21 +0000
> 
> * Alexandre Bertails <bertails@w3.org> [2011-01-19 09:07-0500]
>> On Wed, 2011-01-19 at 14:03 +0100, Boris Villazón Terrazas wrote:
>>> On 19/01/2011 13:22, Harry Halpin wrote:
>>>>> On Jan 18, 2011, at 19:05 , Alexandre Bertails wrote:
>>>>> [snip]
>>>>>> Juan, in order to help you with your action, I would like you to
>>>>>> considerer the following:
>>>>>> 
>>>>>> [[
>>>>>> CREATE TABLE Debts (
>>>>>> Name varchar(50),
>>>>>> Amount Integer
>>>>>> );
>>>>>> INSERT INTO Debts (Name, Amount) VALUES("juan", 50);
>>>>>> INSERT INTO Debts (Name, Amount) VALUES("juan", 50);
>>>>>> ]]
>>>>>> 
>>>>>> Using this very simple RDB [1] example, can you go through all the
>>>>>> phases that lead to the RDF where I owe you 100?
>>>>> 
>>>>> Alex, for my understanding: what I would expect to see in RDF are two
>>>>> pairs of identical triples with different subjects. How would a direct
>>>>> produce anything whereby I owe Juan 100? I will owe 50 twice, but the fact
>>>>> that this is a hundred is a step that the RDF per se cannot say...
>>>> Otherwise known as "RDF can't add, and (up until recently) neither can
>>>> SPARQL". Which is a feature, not a bug, re decidability :)
>>>> 
>>>> However, I'm assuming Alex is asking with Datalog semantics, do you get
>>>> with those two insert statements *two* distinct pairs of triples
>>>> (multiset) or just one (set)?
>>>> 
>>>> Multisets have repeated membership, while sets don't I think.
>>> Thanks Alex for pointing out this case.
>>> This case was included by Eric within the test cases
>>> http://www.w3.org/2001/sw/rdb2rdf/wiki/R2RML_Test_Cases_v1#DB5
>> 
>> You're right. I just made simpler and put it in the body of the email so
>> I was sure that people would look at it :-)
>> 
>>> So, as you said the question is what is the default mapping for this case?
>> 
>> Two triples with different bnodes as subjects.
>> 
>>> how we can face this case with R2RML? and what would be the expected result?
>> 
>> I have no idea.
> 
> I think "facing the case" means making sure the cardinality of SPARQL
> result set matches that of the intuitive SQL result table. For instance,
>   SELECT ?to ?amnt { ?debt Debts:Name ?to ; Debts:Amount ?amnt }
> should give back two rows when
>   SELECT Debts.Name, Debts.Amount FROM Debts
> would give two rows. Likewise
>   SELECT SUM(?amnt) { ?debt :Amount ?amnt }
> should be the same as
>   SELECT SUM(Debts.amnt) FROM Debts
> . This is handled in the direct mapping by creating a blank node for
> each row in each table which does not have a primary key. You could
> handle this in Datalog by ammending
>   http://www.w3.org/mid/20101116000533.GI13630@w3.org
> to include an SQL sequence generator (à la PRIMARY KEY AUTO
> INCREMENT) in a copy of each of the database tables which has no
> primary key. Tweaking from an earlier proposal:
> 
> * Eric Prud'hommeaux <eric@w3.org> [2010-11-15 19:05-0500]
>> The current text assumes the trivial mapping from an SQL table to
>> datalog facts. Eliding column names, we see:
>> 
>>   ┌┤foo├────┬────┐
>>   │ X  │ Y  │ Z  │
>>   │ x1 │ y1 │ z1 │ => foo(x1, y1, z1)
>>   │ x2 │ y2 │ z2 │    foo(x2, y2, z2)
>>   └────┴────┴────┘
>> 
>> If we use, yes, HOL, to create our datalog facts, we can capture
>> things like the relation name, attribute (column) names, even the
>> datatypes and keys. (This effectively recapitulates the catalog.)
>> 
>>   ┌┤foo.schema├─────┬────┬───────┬────┬──────┐
>>   │ name │ a1 │ t1  │ a2 │ t2    │ a3 │ t3   │
>>   │  foo │  X │ int │  Y │ float │  Z │ blob │
>>   └──────┴────┴─────┴────┴───────┴────┴──────┘
> 
> Further reflection suggests a vertical encoding like:
> 
>   ┌┤__attrs├────────┬───────┐
>   │ relation │ attr │  type │
>   │ foo      │ X    │   int │
>   │ foo      │ Y    │ float │
>   │ foo      │ Z    │  blob │
>   └──────────┴──────┴───────┘
> 
>>   ┌┤foo.data├─┬────┬────┐
>>   │ name │ a1 │ a2 │ a3 │
>>   │  foo │ x1 │ y1 │ z1 │
>>   │  foo │ x2 │ y2 │ z2 │
>>   └──────┴────┴────┴────┘
> 
>   ┌┤__data├──┬─────┬──────┬───────┐
>   │ relation │ row │ attr │ value │
>   │ foo      │   1 │ X    │ "1"   │
>   │ foo      │   1 │ Y    │ "1.0" │
>   │ foo      │   1 │ Z    │ "z1"  │
>   │ foo      │   2 │ X    │ "2"   │
>   │ foo      │   2 │ Y    │ "2.0" │
>   │ foo      │   2 │ Z    │ "z2"  │
>   └──────────┴─────┴──────┴───────┘
> 
> 
>> From there, you have all of your column names, relation names, etc. to
>> use FOL to generate the IRIs and literals for the RDF relation. But,
>> if you're actually working in a type-safe FOL, you can encode your RDF
>> in a relation that has types (beyond strings):
>> 
>>   
>> ┌┤foo.graph├┬────────┬─────────┬──────┬────────┬──────────┬──────────────┬───
>> ───────┐
>>   │ sIRI      │ sBNode │ p       │ oIRI │ oBNode │ oLexical │ oDatype      │
>> oLangTag │
>>   │ http…X=x1 │ NULL   │ http…#X │ NULL │ NULL   │       x1 │ http…integer │
>> │
>>   │ http…X=x1 │ NULL   │ http…#Y │ NULL │ NULL   │       y1 │ http…string  │
>> │
>>   │ http…X=x1 │ NULL   │ http…#Z │ NULL │ NULL   │       z1 │ http…float   │
>> │
>>   
>> └───────────┴────────┴─────────┴──────┴────────┴──────────┴──────────────┴───
>> ───────┘
> 
> I coded this transformation in SQL so folks can play along with the
> tools they have at home:
> 
> -- Create the sample database (one table):
> CREATE TABLE foo (X INT,
>                   Y FLOAT,
>                   Z BLOB);
> INSERT INTO foo (X, Y, Z) VALUES (1, 1.0, "z1"),
>                                  (2, 2.0, "z2");
> 
> # Higher-order import of table and attribute names into domain of discourse:
> CREATE TABLE __attrs (relation VARCHAR(20),
>                       attr VARCHAR(20),
>                       type VARCHAR(20));
> INSERT INTO __attrs (relation, attr, type) VALUES ("foo", "X", "INT"),
>                                                   ("foo", "Y", "FLOAT"),
>                                                   ("foo", "Z", "BLOB");
> CREATE TABLE __data (relation VARCHAR(20),
>                      row INT,
>                      attr VARCHAR(20),
>                      value VARCHAR(20));
> INSERT INTO __data (relation, row, attr, value) VALUES ("foo", 1, "X", "1"),
>                                                        ("foo", 1, "Y", "1.0"),
>                                                        ("foo", 1, "Z", "z1"),
>                                                        ("foo", 2, "X", "2"),
>                                                        ("foo", 2, "Y", "2.0"),
>                                                        ("foo", 2, "Z", "z2");
> -- We could capture the keys info in a __keys table, but it's
> -- not necessary for this example.
> 
> -- Representation of RDF graph:
> CREATE TABLE __graph (sIRI VARCHAR(30),
>                       sBNode VARCHAR(30),
>                       pIRI VARCHAR(30),
>                       oIRI VARCHAR(30),
>                       oBNode VARCHAR(30),
>                       oLiteral VARCHAR(30),
>                       oDatatype VARCHAR(30),
>                       oLang VARCHAR(10));
> 
> -- insert triples like (BNode, IRI, Literal)s
> INSERT INTO __graph (sIRI, pIRI, oLiteral, oDatatype)
>    SELECT CONCAT("_:", __data.relation, __data.row),
>           CONCAT(__data.relation, "#", __data.attr),
>           CONCAT("\"", __data.value, "\""),
>           CONCAT("xsd:", __attrs.type)
>      FROM __data, __attrs
>     WHERE __data.relation=__attrs.relation
>       AND __data.attr=__attrs.attr;
> -- Et voila, the literal triples (with a BNode subject):
> SELECT * FROM __graph;
> ┌────────┬────────┬───────┬──────┬────────┬──────────┬───────────┬───────┐
> │ sIRI   │ sBNode │ pIRI  │ oIRI │ oBNode │ oLiteral │ oDatatype │ oLang │
> │ _:foo1 │ NULL   │ foo#X │ NULL │ NULL   │ "1"      │ xsd:INT   │ NULL  │
> │ _:foo1 │ NULL   │ foo#Y │ NULL │ NULL   │ "1.0"    │ xsd:FLOAT │ NULL  │
> │ _:foo1 │ NULL   │ foo#Z │ NULL │ NULL   │ "z1"     │ xsd:BLOB  │ NULL  │
> │ _:foo2 │ NULL   │ foo#X │ NULL │ NULL   │ "2"      │ xsd:INT   │ NULL  │
> │ _:foo2 │ NULL   │ foo#Y │ NULL │ NULL   │ "2.0"    │ xsd:FLOAT │ NULL  │
> │ _:foo2 │ NULL   │ foo#Z │ NULL │ NULL   │ "z2"     │ xsd:BLOB  │ NULL  │
> └────────┴────────┴───────┴──────┴────────┴──────────┴───────────┴───────┘
> 
> The skolemizing apprach in the direct mapping wiki would look more like:
> 
> INSERT INTO __graph (sIRI, pIRI, oLiteral, oDatatype)
>    SELECT CONCAT("_:", __data.relation, __name.skolem),
>           CONCAT(__data.relation, "#", __data.attr),
>           CONCAT("\"", __data.value, "\""),
>           CONCAT("xsd:", __attrs.type)
>      FROM __data, __attrs, (
>             SELECT __data.relation, __data.row,
>                    GROUP_CONCAT(__data.value SEPARATOR "~") AS skolem
>              FROM __data GROUP BY relation, row ) AS __name
>     WHERE __name.relation=__data.relation
>      AND __name.row=__data.row
>      AND __data.relation=__attrs.relation
>      AND __data.attr=__attrs.attr;
> ┌───────────────┬────────┬───────┬──────┬────────┬──────────┬───────────┬─────
> ──┐
> │ sIRI          │ sBNode │ pIRI  │ oIRI │ oBNode │ oLiteral │ oDatatype │
> oLang │
> │ _:foo1~1.0~z1 │ NULL   │ foo#X │ NULL │ NULL   │ "1"      │ xsd:INT   │ NULL
> │
> │ _:foo1~1.0~z1 │ NULL   │ foo#Y │ NULL │ NULL   │ "1.0"    │ xsd:FLOAT │ NULL
> │
> │ _:foo1~1.0~z1 │ NULL   │ foo#Z │ NULL │ NULL   │ "z1"     │ xsd:BLOB  │ NULL
> │
> │ _:foo2~2.0~z2 │ NULL   │ foo#X │ NULL │ NULL   │ "2"      │ xsd:INT   │ NULL
> │
> │ _:foo2~2.0~z2 │ NULL   │ foo#Y │ NULL │ NULL   │ "2.0"    │ xsd:FLOAT │ NULL
> │
> │ _:foo2~2.0~z2 │ NULL   │ foo#Z │ NULL │ NULL   │ "z2"     │ xsd:BLOB  │ NULL
> │
> └───────────────┴────────┴───────┴──────┴────────┴──────────┴───────────┴─────
> ──┘
> 
> which would give you
> 
>   _:Debts~juan~50 <Debts#Name> "juan"^^xsd:STRING .
>   _:Debts~juan~50 <Debts#Amount> "50"^^xsd:INT .
> twice (which would collapse to once), instead of the more faithful
>   _:Debts1 <Debts#Name> "juan"^^xsd:STRING .
>   _:Debts1 <Debts#Amount> "50"^^xsd:INT .
>   _:Debts2 <Debts#Name> "juan"^^xsd:STRING .
>   _:Debts2 <Debts#Amount> "50"^^xsd:INT .
> 
> 
> Import: getting cardinality right matters. Relational heads like to
> work over sets, but ultimately, the practical needs of not having to
> guarantee uniqueness always win. The set-based relational engines of
> the early 80s all either changed to multi-set (changing many
> customers' queries) or folded. By 86, SQL established a multi-set
> standard which maintained cardinality even when uniqueness was
> projected away. At the end of 2006, Fred Zemke interceded to keep the
> SPARQL semantics from having ambiguous cardinality, which cost months
> but gave us invaluable definition in the semantics.
> 
>> The Denotational Semantics for the Direct Mapping (aka. the algebras and
>> the associated mapping function) *already* works on top of multisets
>> (and sets and lists), both in the data model for RDB and in the
>> higher-order functions we define/use to manipulate this data-structure.
>> 
>> Alexandre.
>> 
>>> 
>>> 
>>> Boris
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> 
> 
> -- 
> -ericP
>
Received on Wednesday, 19 January 2011 16:44:28 UTC