Re: example SQL mapping to direct graph

With my chair hat on: I appreciate all your enthusiasm very much, but I
would appreciate it even more if this discussion would be done within the
context of an issue.

Tracker, this is ACTION-95 on Juan.

Cheers,
      Michael

-- 
Dr. Michael Hausenblas, Research Fellow
LiDRC - Linked Data Research Centre
DERI - Digital Enterprise Research Institute
NUIG - National University of Ireland, Galway
Ireland, Europe
Tel. +353 91 495730
http://linkeddata.deri.ie/
http://sw-app.org/about.html



> From: Eric Prud'hommeaux <eric@w3.org>
> Organization: World Wide Web Consortium (W3C) - http://www.w3.org/
> Date: Wed, 19 Jan 2011 11:29:34 -0500
> To: Alexandre Bertails <bertails@w3.org>
> Cc: <bvillazon@fi.upm.es>, Harry Halpin <hhalpin@w3.org>, RDB2RDF WG
> <public-rdb2rdf-wg@w3.org>
> Subject: example SQL mapping to direct graph
> Resent-From: RDB2RDF WG <public-rdb2rdf-wg@w3.org>
> Resent-Date: Wed, 19 Jan 2011 16:31:21 +0000
> 
> * Alexandre Bertails <bertails@w3.org> [2011-01-19 09:07-0500]
>> On Wed, 2011-01-19 at 14:03 +0100, Boris Villazn Terrazas wrote:
>>> On 19/01/2011 13:22, Harry Halpin wrote:
>>>>> On Jan 18, 2011, at 19:05 , Alexandre Bertails wrote:
>>>>> [snip]
>>>>>> Juan, in order to help you with your action, I would like you to
>>>>>> considerer the following:
>>>>>> 
>>>>>> [[
>>>>>> CREATE TABLE Debts (
>>>>>> Name varchar(50),
>>>>>> Amount Integer
>>>>>> );
>>>>>> INSERT INTO Debts (Name, Amount) VALUES("juan", 50);
>>>>>> INSERT INTO Debts (Name, Amount) VALUES("juan", 50);
>>>>>> ]]
>>>>>> 
>>>>>> Using this very simple RDB [1] example, can you go through all the
>>>>>> phases that lead to the RDF where I owe you 100?
>>>>> 
>>>>> Alex, for my understanding: what I would expect to see in RDF are two
>>>>> pairs of identical triples with different subjects. How would a direct
>>>>> produce anything whereby I owe Juan 100? I will owe 50 twice, but the fact
>>>>> that this is a hundred is a step that the RDF per se cannot say...
>>>> Otherwise known as "RDF can't add, and (up until recently) neither can
>>>> SPARQL". Which is a feature, not a bug, re decidability :)
>>>> 
>>>> However, I'm assuming Alex is asking with Datalog semantics, do you get
>>>> with those two insert statements *two* distinct pairs of triples
>>>> (multiset) or just one (set)?
>>>> 
>>>> Multisets have repeated membership, while sets don't I think.
>>> Thanks Alex for pointing out this case.
>>> This case was included by Eric within the test cases
>>> http://www.w3.org/2001/sw/rdb2rdf/wiki/R2RML_Test_Cases_v1#DB5
>> 
>> You're right. I just made simpler and put it in the body of the email so
>> I was sure that people would look at it :-)
>> 
>>> So, as you said the question is what is the default mapping for this case?
>> 
>> Two triples with different bnodes as subjects.
>> 
>>> how we can face this case with R2RML? and what would be the expected result?
>> 
>> I have no idea.
> 
> I think "facing the case" means making sure the cardinality of SPARQL
> result set matches that of the intuitive SQL result table. For instance,
>   SELECT ?to ?amnt { ?debt Debts:Name ?to ; Debts:Amount ?amnt }
> should give back two rows when
>   SELECT Debts.Name, Debts.Amount FROM Debts
> would give two rows. Likewise
>   SELECT SUM(?amnt) { ?debt :Amount ?amnt }
> should be the same as
>   SELECT SUM(Debts.amnt) FROM Debts
> . This is handled in the direct mapping by creating a blank node for
> each row in each table which does not have a primary key. You could
> handle this in Datalog by ammending
>   http://www.w3.org/mid/20101116000533.GI13630@w3.org
> to include an SQL sequence generator ( la PRIMARY KEY AUTO
> INCREMENT) in a copy of each of the database tables which has no
> primary key. Tweaking from an earlier proposal:
> 
> * Eric Prud'hommeaux <eric@w3.org> [2010-11-15 19:05-0500]
>> The current text assumes the trivial mapping from an SQL table to
>> datalog facts. Eliding column names, we see:
>> 
>>   fooЩ
>>    X   Y   Z  
>>    x1  y1  z1  => foo(x1, y1, z1)
>>    x2  y2  z2     foo(x2, y2, z2)
>>   ةة
>> 
>> If we use, yes, HOL, to create our datalog facts, we can capture
>> things like the relation name, attribute (column) names, even the
>> datatypes and keys. (This effectively recapitulates the catalog.)
>> 
>>   foo.schemaЩЩЩЩ
>>    name  a1  t1   a2  t2     a3  t3   
>>     foo   X  int   Y  float   Z  blob 
>>   ةةةةةة
> 
> Further reflection suggests a vertical encoding like:
> 
>   __attrsЩ
>    relation  attr   type 
>    foo       X       int 
>    foo       Y     float 
>    foo       Z      blob 
>   ةة
> 
>>   foo.dataЩЩ
>>    name  a1  a2  a3 
>>     foo  x1  y1  z1 
>>     foo  x2  y2  z2 
>>   ةةة
> 
>   __dataЩЩЩ
>    relation  row  attr  value 
>    foo         1  X     "1"   
>    foo         1  Y     "1.0" 
>    foo         1  Z     "z1"  
>    foo         2  X     "2"   
>    foo         2  Y     "2.0" 
>    foo         2  Z     "z2"  
>   ةةة
> 
> 
>> From there, you have all of your column names, relation names, etc. to
>> use FOL to generate the IRIs and literals for the RDF relation. But,
>> if you're actually working in a type-safe FOL, you can encode your RDF
>> in a relation that has types (beyond strings):
>> 
>>   
>> foo.graphЩЩЩЩЩЩЩ
>> 
>>    sIRI       sBNode  p        oIRI  oBNode  oLexical  oDatype      
>> oLangTag 
>>    httpX=x1  NULL    http#X  NULL  NULL          x1  httpinteger 
>> 
>>    httpX=x1  NULL    http#Y  NULL  NULL          y1  httpstring  
>> 
>>    httpX=x1  NULL    http#Z  NULL  NULL          z1  httpfloat   
>> 
>>   
>> ةةةةةةة
>> 
> 
> I coded this transformation in SQL so folks can play along with the
> tools they have at home:
> 
> -- Create the sample database (one table):
> CREATE TABLE foo (X INT,
>                   Y FLOAT,
>                   Z BLOB);
> INSERT INTO foo (X, Y, Z) VALUES (1, 1.0, "z1"),
>                                  (2, 2.0, "z2");
> 
> # Higher-order import of table and attribute names into domain of discourse:
> CREATE TABLE __attrs (relation VARCHAR(20),
>                       attr VARCHAR(20),
>                       type VARCHAR(20));
> INSERT INTO __attrs (relation, attr, type) VALUES ("foo", "X", "INT"),
>                                                   ("foo", "Y", "FLOAT"),
>                                                   ("foo", "Z", "BLOB");
> CREATE TABLE __data (relation VARCHAR(20),
>                      row INT,
>                      attr VARCHAR(20),
>                      value VARCHAR(20));
> INSERT INTO __data (relation, row, attr, value) VALUES ("foo", 1, "X", "1"),
>                                                        ("foo", 1, "Y", "1.0"),
>                                                        ("foo", 1, "Z", "z1"),
>                                                        ("foo", 2, "X", "2"),
>                                                        ("foo", 2, "Y", "2.0"),
>                                                        ("foo", 2, "Z", "z2");
> -- We could capture the keys info in a __keys table, but it's
> -- not necessary for this example.
> 
> -- Representation of RDF graph:
> CREATE TABLE __graph (sIRI VARCHAR(30),
>                       sBNode VARCHAR(30),
>                       pIRI VARCHAR(30),
>                       oIRI VARCHAR(30),
>                       oBNode VARCHAR(30),
>                       oLiteral VARCHAR(30),
>                       oDatatype VARCHAR(30),
>                       oLang VARCHAR(10));
> 
> -- insert triples like (BNode, IRI, Literal)s
> INSERT INTO __graph (sIRI, pIRI, oLiteral, oDatatype)
>    SELECT CONCAT("_:", __data.relation, __data.row),
>           CONCAT(__data.relation, "#", __data.attr),
>           CONCAT("\"", __data.value, "\""),
>           CONCAT("xsd:", __attrs.type)
>      FROM __data, __attrs
>     WHERE __data.relation=__attrs.relation
>       AND __data.attr=__attrs.attr;
> -- Et voila, the literal triples (with a BNode subject):
> SELECT * FROM __graph;
> ЩЩЩЩЩЩЩ
>  sIRI    sBNode  pIRI   oIRI  oBNode  oLiteral  oDatatype  oLang 
>  _:foo1  NULL    foo#X  NULL  NULL    "1"       xsd:INT    NULL  
>  _:foo1  NULL    foo#Y  NULL  NULL    "1.0"     xsd:FLOAT  NULL  
>  _:foo1  NULL    foo#Z  NULL  NULL    "z1"      xsd:BLOB   NULL  
>  _:foo2  NULL    foo#X  NULL  NULL    "2"       xsd:INT    NULL  
>  _:foo2  NULL    foo#Y  NULL  NULL    "2.0"     xsd:FLOAT  NULL  
>  _:foo2  NULL    foo#Z  NULL  NULL    "z2"      xsd:BLOB   NULL  
> ةةةةةةة
> 
> The skolemizing apprach in the direct mapping wiki would look more like:
> 
> INSERT INTO __graph (sIRI, pIRI, oLiteral, oDatatype)
>    SELECT CONCAT("_:", __data.relation, __name.skolem),
>           CONCAT(__data.relation, "#", __data.attr),
>           CONCAT("\"", __data.value, "\""),
>           CONCAT("xsd:", __attrs.type)
>      FROM __data, __attrs, (
>             SELECT __data.relation, __data.row,
>                    GROUP_CONCAT(__data.value SEPARATOR "~") AS skolem
>              FROM __data GROUP BY relation, row ) AS __name
>     WHERE __name.relation=__data.relation
>      AND __name.row=__data.row
>      AND __data.relation=__attrs.relation
>      AND __data.attr=__attrs.attr;
> ЩЩЩЩЩЩЩ
> 
>  sIRI           sBNode  pIRI   oIRI  oBNode  oLiteral  oDatatype 
> oLang 
>  _:foo1~1.0~z1  NULL    foo#X  NULL  NULL    "1"       xsd:INT    NULL
> 
>  _:foo1~1.0~z1  NULL    foo#Y  NULL  NULL    "1.0"     xsd:FLOAT  NULL
> 
>  _:foo1~1.0~z1  NULL    foo#Z  NULL  NULL    "z1"      xsd:BLOB   NULL
> 
>  _:foo2~2.0~z2  NULL    foo#X  NULL  NULL    "2"       xsd:INT    NULL
> 
>  _:foo2~2.0~z2  NULL    foo#Y  NULL  NULL    "2.0"     xsd:FLOAT  NULL
> 
>  _:foo2~2.0~z2  NULL    foo#Z  NULL  NULL    "z2"      xsd:BLOB   NULL
> 
> ةةةةةةة
> 
> 
> which would give you
> 
>   _:Debts~juan~50 <Debts#Name> "juan"^^xsd:STRING .
>   _:Debts~juan~50 <Debts#Amount> "50"^^xsd:INT .
> twice (which would collapse to once), instead of the more faithful
>   _:Debts1 <Debts#Name> "juan"^^xsd:STRING .
>   _:Debts1 <Debts#Amount> "50"^^xsd:INT .
>   _:Debts2 <Debts#Name> "juan"^^xsd:STRING .
>   _:Debts2 <Debts#Amount> "50"^^xsd:INT .
> 
> 
> Import: getting cardinality right matters. Relational heads like to
> work over sets, but ultimately, the practical needs of not having to
> guarantee uniqueness always win. The set-based relational engines of
> the early 80s all either changed to multi-set (changing many
> customers' queries) or folded. By 86, SQL established a multi-set
> standard which maintained cardinality even when uniqueness was
> projected away. At the end of 2006, Fred Zemke interceded to keep the
> SPARQL semantics from having ambiguous cardinality, which cost months
> but gave us invaluable definition in the semantics.
> 
>> The Denotational Semantics for the Direct Mapping (aka. the algebras and
>> the associated mapping function) *already* works on top of multisets
>> (and sets and lists), both in the data model for RDB and in the
>> higher-order functions we define/use to manipulate this data-structure.
>> 
>> Alexandre.
>> 
>>> 
>>> 
>>> Boris
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> 
> 
> -- 
> -ericP
> 

Received on Wednesday, 19 January 2011 16:44:28 UTC