Re: example SQL mapping to direct graph

With my chair hat on: I appreciate all your enthusiasm very much, but I
would appreciate it even more if this discussion would be done within the
context of an issue.

Tracker, this is ACTION-95 on Juan.

Cheers,
      Michael

-- 
Dr. Michael Hausenblas, Research Fellow
LiDRC - Linked Data Research Centre
DERI - Digital Enterprise Research Institute
NUIG - National University of Ireland, Galway
Ireland, Europe
Tel. +353 91 495730
http://linkeddata.deri.ie/
http://sw-app.org/about.html



> From: Eric Prud'hommeaux <eric@w3.org>
> Organization: World Wide Web Consortium (W3C) - http://www.w3.org/
> Date: Wed, 19 Jan 2011 11:29:34 -0500
> To: Alexandre Bertails <bertails@w3.org>
> Cc: <bvillazon@fi.upm.es>, Harry Halpin <hhalpin@w3.org>, RDB2RDF WG
> <public-rdb2rdf-wg@w3.org>
> Subject: example SQL mapping to direct graph
> Resent-From: RDB2RDF WG <public-rdb2rdf-wg@w3.org>
> Resent-Date: Wed, 19 Jan 2011 16:31:21 +0000
> 
> * Alexandre Bertails <bertails@w3.org> [2011-01-19 09:07-0500]
>> On Wed, 2011-01-19 at 14:03 +0100, Boris Villaz¨®n Terrazas wrote:
>>> On 19/01/2011 13:22, Harry Halpin wrote:
>>>>> On Jan 18, 2011, at 19:05 , Alexandre Bertails wrote:
>>>>> [snip]
>>>>>> Juan, in order to help you with your action, I would like you to
>>>>>> considerer the following:
>>>>>> 
>>>>>> [[
>>>>>> CREATE TABLE Debts (
>>>>>> Name varchar(50),
>>>>>> Amount Integer
>>>>>> );
>>>>>> INSERT INTO Debts (Name, Amount) VALUES("juan", 50);
>>>>>> INSERT INTO Debts (Name, Amount) VALUES("juan", 50);
>>>>>> ]]
>>>>>> 
>>>>>> Using this very simple RDB [1] example, can you go through all the
>>>>>> phases that lead to the RDF where I owe you 100?
>>>>> 
>>>>> Alex, for my understanding: what I would expect to see in RDF are two
>>>>> pairs of identical triples with different subjects. How would a direct
>>>>> produce anything whereby I owe Juan 100? I will owe 50 twice, but the fact
>>>>> that this is a hundred is a step that the RDF per se cannot say...
>>>> Otherwise known as "RDF can't add, and (up until recently) neither can
>>>> SPARQL". Which is a feature, not a bug, re decidability :)
>>>> 
>>>> However, I'm assuming Alex is asking with Datalog semantics, do you get
>>>> with those two insert statements *two* distinct pairs of triples
>>>> (multiset) or just one (set)?
>>>> 
>>>> Multisets have repeated membership, while sets don't I think.
>>> Thanks Alex for pointing out this case.
>>> This case was included by Eric within the test cases
>>> http://www.w3.org/2001/sw/rdb2rdf/wiki/R2RML_Test_Cases_v1#DB5
>> 
>> You're right. I just made simpler and put it in the body of the email so
>> I was sure that people would look at it :-)
>> 
>>> So, as you said the question is what is the default mapping for this case?
>> 
>> Two triples with different bnodes as subjects.
>> 
>>> how we can face this case with R2RML? and what would be the expected result?
>> 
>> I have no idea.
> 
> I think "facing the case" means making sure the cardinality of SPARQL
> result set matches that of the intuitive SQL result table. For instance,
>   SELECT ?to ?amnt { ?debt Debts:Name ?to ; Debts:Amount ?amnt }
> should give back two rows when
>   SELECT Debts.Name, Debts.Amount FROM Debts
> would give two rows. Likewise
>   SELECT SUM(?amnt) { ?debt :Amount ?amnt }
> should be the same as
>   SELECT SUM(Debts.amnt) FROM Debts
> . This is handled in the direct mapping by creating a blank node for
> each row in each table which does not have a primary key. You could
> handle this in Datalog by ammending
>   http://www.w3.org/mid/20101116000533.GI13630@w3.org
> to include an SQL sequence generator (¨¤ la PRIMARY KEY AUTO
> INCREMENT) in a copy of each of the database tables which has no
> primary key. Tweaking from an earlier proposal:
> 
> * Eric Prud'hommeaux <eric@w3.org> [2010-11-15 19:05-0500]
>> The current text assumes the trivial mapping from an SQL table to
>> datalog facts. Eliding column names, we see:
>> 
>>   ©°©Èfoo©À©¤©¤©¤©¤©Ð©¤©¤©¤©¤©´
>>   ©¦ X  ©¦ Y  ©¦ Z  ©¦
>>   ©¦ x1 ©¦ y1 ©¦ z1 ©¦ => foo(x1, y1, z1)
>>   ©¦ x2 ©¦ y2 ©¦ z2 ©¦    foo(x2, y2, z2)
>>   ©¸©¤©¤©¤©¤©Ø©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¼
>> 
>> If we use, yes, HOL, to create our datalog facts, we can capture
>> things like the relation name, attribute (column) names, even the
>> datatypes and keys. (This effectively recapitulates the catalog.)
>> 
>>   ©°©Èfoo.schema©À©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©´
>>   ©¦ name ©¦ a1 ©¦ t1  ©¦ a2 ©¦ t2    ©¦ a3 ©¦ t3   ©¦
>>   ©¦  foo ©¦  X ©¦ int ©¦  Y ©¦ float ©¦  Z ©¦ blob ©¦
>>   ©¸©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¼
> 
> Further reflection suggests a vertical encoding like:
> 
>   ©°©È__attrs©À©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©´
>   ©¦ relation ©¦ attr ©¦  type ©¦
>   ©¦ foo      ©¦ X    ©¦   int ©¦
>   ©¦ foo      ©¦ Y    ©¦ float ©¦
>   ©¦ foo      ©¦ Z    ©¦  blob ©¦
>   ©¸©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¼
> 
>>   ©°©Èfoo.data©À©¤©Ð©¤©¤©¤©¤©Ð©¤©¤©¤©¤©´
>>   ©¦ name ©¦ a1 ©¦ a2 ©¦ a3 ©¦
>>   ©¦  foo ©¦ x1 ©¦ y1 ©¦ z1 ©¦
>>   ©¦  foo ©¦ x2 ©¦ y2 ©¦ z2 ©¦
>>   ©¸©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©Ø©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¼
> 
>   ©°©È__data©À©¤©¤©Ð©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©´
>   ©¦ relation ©¦ row ©¦ attr ©¦ value ©¦
>   ©¦ foo      ©¦   1 ©¦ X    ©¦ "1"   ©¦
>   ©¦ foo      ©¦   1 ©¦ Y    ©¦ "1.0" ©¦
>   ©¦ foo      ©¦   1 ©¦ Z    ©¦ "z1"  ©¦
>   ©¦ foo      ©¦   2 ©¦ X    ©¦ "2"   ©¦
>   ©¦ foo      ©¦   2 ©¦ Y    ©¦ "2.0" ©¦
>   ©¦ foo      ©¦   2 ©¦ Z    ©¦ "z2"  ©¦
>   ©¸©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¼
> 
> 
>> From there, you have all of your column names, relation names, etc. to
>> use FOL to generate the IRIs and literals for the RDF relation. But,
>> if you're actually working in a type-safe FOL, you can encode your RDF
>> in a relation that has types (beyond strings):
>> 
>>   
>> ©°©Èfoo.graph©À©Ð©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤
>> ©¤©¤©¤©¤©¤©¤©¤©´
>>   ©¦ sIRI      ©¦ sBNode ©¦ p       ©¦ oIRI ©¦ oBNode ©¦ oLexical ©¦ oDatype      ©¦
>> oLangTag ©¦
>>   ©¦ http¡­X=x1 ©¦ NULL   ©¦ http¡­#X ©¦ NULL ©¦ NULL   ©¦       x1 ©¦ http¡­integer ©¦
>> ©¦
>>   ©¦ http¡­X=x1 ©¦ NULL   ©¦ http¡­#Y ©¦ NULL ©¦ NULL   ©¦       y1 ©¦ http¡­string  ©¦
>> ©¦
>>   ©¦ http¡­X=x1 ©¦ NULL   ©¦ http¡­#Z ©¦ NULL ©¦ NULL   ©¦       z1 ©¦ http¡­float   ©¦
>> ©¦
>>   
>> ©¸©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤
>> ©¤©¤©¤©¤©¤©¤©¤©¼
> 
> I coded this transformation in SQL so folks can play along with the
> tools they have at home:
> 
> -- Create the sample database (one table):
> CREATE TABLE foo (X INT,
>                   Y FLOAT,
>                   Z BLOB);
> INSERT INTO foo (X, Y, Z) VALUES (1, 1.0, "z1"),
>                                  (2, 2.0, "z2");
> 
> # Higher-order import of table and attribute names into domain of discourse:
> CREATE TABLE __attrs (relation VARCHAR(20),
>                       attr VARCHAR(20),
>                       type VARCHAR(20));
> INSERT INTO __attrs (relation, attr, type) VALUES ("foo", "X", "INT"),
>                                                   ("foo", "Y", "FLOAT"),
>                                                   ("foo", "Z", "BLOB");
> CREATE TABLE __data (relation VARCHAR(20),
>                      row INT,
>                      attr VARCHAR(20),
>                      value VARCHAR(20));
> INSERT INTO __data (relation, row, attr, value) VALUES ("foo", 1, "X", "1"),
>                                                        ("foo", 1, "Y", "1.0"),
>                                                        ("foo", 1, "Z", "z1"),
>                                                        ("foo", 2, "X", "2"),
>                                                        ("foo", 2, "Y", "2.0"),
>                                                        ("foo", 2, "Z", "z2");
> -- We could capture the keys info in a __keys table, but it's
> -- not necessary for this example.
> 
> -- Representation of RDF graph:
> CREATE TABLE __graph (sIRI VARCHAR(30),
>                       sBNode VARCHAR(30),
>                       pIRI VARCHAR(30),
>                       oIRI VARCHAR(30),
>                       oBNode VARCHAR(30),
>                       oLiteral VARCHAR(30),
>                       oDatatype VARCHAR(30),
>                       oLang VARCHAR(10));
> 
> -- insert triples like (BNode, IRI, Literal)s
> INSERT INTO __graph (sIRI, pIRI, oLiteral, oDatatype)
>    SELECT CONCAT("_:", __data.relation, __data.row),
>           CONCAT(__data.relation, "#", __data.attr),
>           CONCAT("\"", __data.value, "\""),
>           CONCAT("xsd:", __attrs.type)
>      FROM __data, __attrs
>     WHERE __data.relation=__attrs.relation
>       AND __data.attr=__attrs.attr;
> -- Et voila, the literal triples (with a BNode subject):
> SELECT * FROM __graph;
> ©°©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©´
> ©¦ sIRI   ©¦ sBNode ©¦ pIRI  ©¦ oIRI ©¦ oBNode ©¦ oLiteral ©¦ oDatatype ©¦ oLang ©¦
> ©¦ _:foo1 ©¦ NULL   ©¦ foo#X ©¦ NULL ©¦ NULL   ©¦ "1"      ©¦ xsd:INT   ©¦ NULL  ©¦
> ©¦ _:foo1 ©¦ NULL   ©¦ foo#Y ©¦ NULL ©¦ NULL   ©¦ "1.0"    ©¦ xsd:FLOAT ©¦ NULL  ©¦
> ©¦ _:foo1 ©¦ NULL   ©¦ foo#Z ©¦ NULL ©¦ NULL   ©¦ "z1"     ©¦ xsd:BLOB  ©¦ NULL  ©¦
> ©¦ _:foo2 ©¦ NULL   ©¦ foo#X ©¦ NULL ©¦ NULL   ©¦ "2"      ©¦ xsd:INT   ©¦ NULL  ©¦
> ©¦ _:foo2 ©¦ NULL   ©¦ foo#Y ©¦ NULL ©¦ NULL   ©¦ "2.0"    ©¦ xsd:FLOAT ©¦ NULL  ©¦
> ©¦ _:foo2 ©¦ NULL   ©¦ foo#Z ©¦ NULL ©¦ NULL   ©¦ "z2"     ©¦ xsd:BLOB  ©¦ NULL  ©¦
> ©¸©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¼
> 
> The skolemizing apprach in the direct mapping wiki would look more like:
> 
> INSERT INTO __graph (sIRI, pIRI, oLiteral, oDatatype)
>    SELECT CONCAT("_:", __data.relation, __name.skolem),
>           CONCAT(__data.relation, "#", __data.attr),
>           CONCAT("\"", __data.value, "\""),
>           CONCAT("xsd:", __attrs.type)
>      FROM __data, __attrs, (
>             SELECT __data.relation, __data.row,
>                    GROUP_CONCAT(__data.value SEPARATOR "~") AS skolem
>              FROM __data GROUP BY relation, row ) AS __name
>     WHERE __name.relation=__data.relation
>      AND __name.row=__data.row
>      AND __data.relation=__attrs.relation
>      AND __data.attr=__attrs.attr;
> ©°©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤
> ©¤©¤©´
> ©¦ sIRI          ©¦ sBNode ©¦ pIRI  ©¦ oIRI ©¦ oBNode ©¦ oLiteral ©¦ oDatatype ©¦
> oLang ©¦
> ©¦ _:foo1~1.0~z1 ©¦ NULL   ©¦ foo#X ©¦ NULL ©¦ NULL   ©¦ "1"      ©¦ xsd:INT   ©¦ NULL
> ©¦
> ©¦ _:foo1~1.0~z1 ©¦ NULL   ©¦ foo#Y ©¦ NULL ©¦ NULL   ©¦ "1.0"    ©¦ xsd:FLOAT ©¦ NULL
> ©¦
> ©¦ _:foo1~1.0~z1 ©¦ NULL   ©¦ foo#Z ©¦ NULL ©¦ NULL   ©¦ "z1"     ©¦ xsd:BLOB  ©¦ NULL
> ©¦
> ©¦ _:foo2~2.0~z2 ©¦ NULL   ©¦ foo#X ©¦ NULL ©¦ NULL   ©¦ "2"      ©¦ xsd:INT   ©¦ NULL
> ©¦
> ©¦ _:foo2~2.0~z2 ©¦ NULL   ©¦ foo#Y ©¦ NULL ©¦ NULL   ©¦ "2.0"    ©¦ xsd:FLOAT ©¦ NULL
> ©¦
> ©¦ _:foo2~2.0~z2 ©¦ NULL   ©¦ foo#Z ©¦ NULL ©¦ NULL   ©¦ "z2"     ©¦ xsd:BLOB  ©¦ NULL
> ©¦
> ©¸©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤
> ©¤©¤©¼
> 
> which would give you
> 
>   _:Debts~juan~50 <Debts#Name> "juan"^^xsd:STRING .
>   _:Debts~juan~50 <Debts#Amount> "50"^^xsd:INT .
> twice (which would collapse to once), instead of the more faithful
>   _:Debts1 <Debts#Name> "juan"^^xsd:STRING .
>   _:Debts1 <Debts#Amount> "50"^^xsd:INT .
>   _:Debts2 <Debts#Name> "juan"^^xsd:STRING .
>   _:Debts2 <Debts#Amount> "50"^^xsd:INT .
> 
> 
> Import: getting cardinality right matters. Relational heads like to
> work over sets, but ultimately, the practical needs of not having to
> guarantee uniqueness always win. The set-based relational engines of
> the early 80s all either changed to multi-set (changing many
> customers' queries) or folded. By 86, SQL established a multi-set
> standard which maintained cardinality even when uniqueness was
> projected away. At the end of 2006, Fred Zemke interceded to keep the
> SPARQL semantics from having ambiguous cardinality, which cost months
> but gave us invaluable definition in the semantics.
> 
>> The Denotational Semantics for the Direct Mapping (aka. the algebras and
>> the associated mapping function) *already* works on top of multisets
>> (and sets and lists), both in the data model for RDB and in the
>> higher-order functions we define/use to manipulate this data-structure.
>> 
>> Alexandre.
>> 
>>> 
>>> 
>>> Boris
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> 
> 
> -- 
> -ericP
> 

Received on Wednesday, 19 January 2011 16:44:28 UTC