- From: Michael Hausenblas <michael.hausenblas@deri.org>
- Date: Wed, 19 Jan 2011 16:42:49 +0000
- To: Juan Sequeda <juanfederico@gmail.com>, Alexandre Bertails <bertails@w3.org>, Eric Prud'hommeaux <eric@w3.org>
- CC: RDB2RDF WG <public-rdb2rdf-wg@w3.org>
With my chair hat on: I appreciate all your enthusiasm very much, but I
would appreciate it even more if this discussion would be done within the
context of an issue.
Tracker, this is ACTION-95 on Juan.
Cheers,
Michael
--
Dr. Michael Hausenblas, Research Fellow
LiDRC - Linked Data Research Centre
DERI - Digital Enterprise Research Institute
NUIG - National University of Ireland, Galway
Ireland, Europe
Tel. +353 91 495730
http://linkeddata.deri.ie/
http://sw-app.org/about.html
> From: Eric Prud'hommeaux <eric@w3.org>
> Organization: World Wide Web Consortium (W3C) - http://www.w3.org/
> Date: Wed, 19 Jan 2011 11:29:34 -0500
> To: Alexandre Bertails <bertails@w3.org>
> Cc: <bvillazon@fi.upm.es>, Harry Halpin <hhalpin@w3.org>, RDB2RDF WG
> <public-rdb2rdf-wg@w3.org>
> Subject: example SQL mapping to direct graph
> Resent-From: RDB2RDF WG <public-rdb2rdf-wg@w3.org>
> Resent-Date: Wed, 19 Jan 2011 16:31:21 +0000
>
> * Alexandre Bertails <bertails@w3.org> [2011-01-19 09:07-0500]
>> On Wed, 2011-01-19 at 14:03 +0100, Boris Villaz¨®n Terrazas wrote:
>>> On 19/01/2011 13:22, Harry Halpin wrote:
>>>>> On Jan 18, 2011, at 19:05 , Alexandre Bertails wrote:
>>>>> [snip]
>>>>>> Juan, in order to help you with your action, I would like you to
>>>>>> considerer the following:
>>>>>>
>>>>>> [[
>>>>>> CREATE TABLE Debts (
>>>>>> Name varchar(50),
>>>>>> Amount Integer
>>>>>> );
>>>>>> INSERT INTO Debts (Name, Amount) VALUES("juan", 50);
>>>>>> INSERT INTO Debts (Name, Amount) VALUES("juan", 50);
>>>>>> ]]
>>>>>>
>>>>>> Using this very simple RDB [1] example, can you go through all the
>>>>>> phases that lead to the RDF where I owe you 100?
>>>>>
>>>>> Alex, for my understanding: what I would expect to see in RDF are two
>>>>> pairs of identical triples with different subjects. How would a direct
>>>>> produce anything whereby I owe Juan 100? I will owe 50 twice, but the fact
>>>>> that this is a hundred is a step that the RDF per se cannot say...
>>>> Otherwise known as "RDF can't add, and (up until recently) neither can
>>>> SPARQL". Which is a feature, not a bug, re decidability :)
>>>>
>>>> However, I'm assuming Alex is asking with Datalog semantics, do you get
>>>> with those two insert statements *two* distinct pairs of triples
>>>> (multiset) or just one (set)?
>>>>
>>>> Multisets have repeated membership, while sets don't I think.
>>> Thanks Alex for pointing out this case.
>>> This case was included by Eric within the test cases
>>> http://www.w3.org/2001/sw/rdb2rdf/wiki/R2RML_Test_Cases_v1#DB5
>>
>> You're right. I just made simpler and put it in the body of the email so
>> I was sure that people would look at it :-)
>>
>>> So, as you said the question is what is the default mapping for this case?
>>
>> Two triples with different bnodes as subjects.
>>
>>> how we can face this case with R2RML? and what would be the expected result?
>>
>> I have no idea.
>
> I think "facing the case" means making sure the cardinality of SPARQL
> result set matches that of the intuitive SQL result table. For instance,
> SELECT ?to ?amnt { ?debt Debts:Name ?to ; Debts:Amount ?amnt }
> should give back two rows when
> SELECT Debts.Name, Debts.Amount FROM Debts
> would give two rows. Likewise
> SELECT SUM(?amnt) { ?debt :Amount ?amnt }
> should be the same as
> SELECT SUM(Debts.amnt) FROM Debts
> . This is handled in the direct mapping by creating a blank node for
> each row in each table which does not have a primary key. You could
> handle this in Datalog by ammending
> http://www.w3.org/mid/20101116000533.GI13630@w3.org
> to include an SQL sequence generator (¨¤ la PRIMARY KEY AUTO
> INCREMENT) in a copy of each of the database tables which has no
> primary key. Tweaking from an earlier proposal:
>
> * Eric Prud'hommeaux <eric@w3.org> [2010-11-15 19:05-0500]
>> The current text assumes the trivial mapping from an SQL table to
>> datalog facts. Eliding column names, we see:
>>
>> ©°©Èfoo©À©¤©¤©¤©¤©Ð©¤©¤©¤©¤©´
>> ©¦ X ©¦ Y ©¦ Z ©¦
>> ©¦ x1 ©¦ y1 ©¦ z1 ©¦ => foo(x1, y1, z1)
>> ©¦ x2 ©¦ y2 ©¦ z2 ©¦ foo(x2, y2, z2)
>> ©¸©¤©¤©¤©¤©Ø©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¼
>>
>> If we use, yes, HOL, to create our datalog facts, we can capture
>> things like the relation name, attribute (column) names, even the
>> datatypes and keys. (This effectively recapitulates the catalog.)
>>
>> ©°©Èfoo.schema©À©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©´
>> ©¦ name ©¦ a1 ©¦ t1 ©¦ a2 ©¦ t2 ©¦ a3 ©¦ t3 ©¦
>> ©¦ foo ©¦ X ©¦ int ©¦ Y ©¦ float ©¦ Z ©¦ blob ©¦
>> ©¸©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¼
>
> Further reflection suggests a vertical encoding like:
>
> ©°©È__attrs©À©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©´
> ©¦ relation ©¦ attr ©¦ type ©¦
> ©¦ foo ©¦ X ©¦ int ©¦
> ©¦ foo ©¦ Y ©¦ float ©¦
> ©¦ foo ©¦ Z ©¦ blob ©¦
> ©¸©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¼
>
>> ©°©Èfoo.data©À©¤©Ð©¤©¤©¤©¤©Ð©¤©¤©¤©¤©´
>> ©¦ name ©¦ a1 ©¦ a2 ©¦ a3 ©¦
>> ©¦ foo ©¦ x1 ©¦ y1 ©¦ z1 ©¦
>> ©¦ foo ©¦ x2 ©¦ y2 ©¦ z2 ©¦
>> ©¸©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©Ø©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¼
>
> ©°©È__data©À©¤©¤©Ð©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©´
> ©¦ relation ©¦ row ©¦ attr ©¦ value ©¦
> ©¦ foo ©¦ 1 ©¦ X ©¦ "1" ©¦
> ©¦ foo ©¦ 1 ©¦ Y ©¦ "1.0" ©¦
> ©¦ foo ©¦ 1 ©¦ Z ©¦ "z1" ©¦
> ©¦ foo ©¦ 2 ©¦ X ©¦ "2" ©¦
> ©¦ foo ©¦ 2 ©¦ Y ©¦ "2.0" ©¦
> ©¦ foo ©¦ 2 ©¦ Z ©¦ "z2" ©¦
> ©¸©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¼
>
>
>> From there, you have all of your column names, relation names, etc. to
>> use FOL to generate the IRIs and literals for the RDF relation. But,
>> if you're actually working in a type-safe FOL, you can encode your RDF
>> in a relation that has types (beyond strings):
>>
>>
>> ©°©Èfoo.graph©À©Ð©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤
>> ©¤©¤©¤©¤©¤©¤©¤©´
>> ©¦ sIRI ©¦ sBNode ©¦ p ©¦ oIRI ©¦ oBNode ©¦ oLexical ©¦ oDatype ©¦
>> oLangTag ©¦
>> ©¦ http¡X=x1 ©¦ NULL ©¦ http¡#X ©¦ NULL ©¦ NULL ©¦ x1 ©¦ http¡integer ©¦
>> ©¦
>> ©¦ http¡X=x1 ©¦ NULL ©¦ http¡#Y ©¦ NULL ©¦ NULL ©¦ y1 ©¦ http¡string ©¦
>> ©¦
>> ©¦ http¡X=x1 ©¦ NULL ©¦ http¡#Z ©¦ NULL ©¦ NULL ©¦ z1 ©¦ http¡float ©¦
>> ©¦
>>
>> ©¸©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤
>> ©¤©¤©¤©¤©¤©¤©¤©¼
>
> I coded this transformation in SQL so folks can play along with the
> tools they have at home:
>
> -- Create the sample database (one table):
> CREATE TABLE foo (X INT,
> Y FLOAT,
> Z BLOB);
> INSERT INTO foo (X, Y, Z) VALUES (1, 1.0, "z1"),
> (2, 2.0, "z2");
>
> # Higher-order import of table and attribute names into domain of discourse:
> CREATE TABLE __attrs (relation VARCHAR(20),
> attr VARCHAR(20),
> type VARCHAR(20));
> INSERT INTO __attrs (relation, attr, type) VALUES ("foo", "X", "INT"),
> ("foo", "Y", "FLOAT"),
> ("foo", "Z", "BLOB");
> CREATE TABLE __data (relation VARCHAR(20),
> row INT,
> attr VARCHAR(20),
> value VARCHAR(20));
> INSERT INTO __data (relation, row, attr, value) VALUES ("foo", 1, "X", "1"),
> ("foo", 1, "Y", "1.0"),
> ("foo", 1, "Z", "z1"),
> ("foo", 2, "X", "2"),
> ("foo", 2, "Y", "2.0"),
> ("foo", 2, "Z", "z2");
> -- We could capture the keys info in a __keys table, but it's
> -- not necessary for this example.
>
> -- Representation of RDF graph:
> CREATE TABLE __graph (sIRI VARCHAR(30),
> sBNode VARCHAR(30),
> pIRI VARCHAR(30),
> oIRI VARCHAR(30),
> oBNode VARCHAR(30),
> oLiteral VARCHAR(30),
> oDatatype VARCHAR(30),
> oLang VARCHAR(10));
>
> -- insert triples like (BNode, IRI, Literal)s
> INSERT INTO __graph (sIRI, pIRI, oLiteral, oDatatype)
> SELECT CONCAT("_:", __data.relation, __data.row),
> CONCAT(__data.relation, "#", __data.attr),
> CONCAT("\"", __data.value, "\""),
> CONCAT("xsd:", __attrs.type)
> FROM __data, __attrs
> WHERE __data.relation=__attrs.relation
> AND __data.attr=__attrs.attr;
> -- Et voila, the literal triples (with a BNode subject):
> SELECT * FROM __graph;
> ©°©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©´
> ©¦ sIRI ©¦ sBNode ©¦ pIRI ©¦ oIRI ©¦ oBNode ©¦ oLiteral ©¦ oDatatype ©¦ oLang ©¦
> ©¦ _:foo1 ©¦ NULL ©¦ foo#X ©¦ NULL ©¦ NULL ©¦ "1" ©¦ xsd:INT ©¦ NULL ©¦
> ©¦ _:foo1 ©¦ NULL ©¦ foo#Y ©¦ NULL ©¦ NULL ©¦ "1.0" ©¦ xsd:FLOAT ©¦ NULL ©¦
> ©¦ _:foo1 ©¦ NULL ©¦ foo#Z ©¦ NULL ©¦ NULL ©¦ "z1" ©¦ xsd:BLOB ©¦ NULL ©¦
> ©¦ _:foo2 ©¦ NULL ©¦ foo#X ©¦ NULL ©¦ NULL ©¦ "2" ©¦ xsd:INT ©¦ NULL ©¦
> ©¦ _:foo2 ©¦ NULL ©¦ foo#Y ©¦ NULL ©¦ NULL ©¦ "2.0" ©¦ xsd:FLOAT ©¦ NULL ©¦
> ©¦ _:foo2 ©¦ NULL ©¦ foo#Z ©¦ NULL ©¦ NULL ©¦ "z2" ©¦ xsd:BLOB ©¦ NULL ©¦
> ©¸©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¼
>
> The skolemizing apprach in the direct mapping wiki would look more like:
>
> INSERT INTO __graph (sIRI, pIRI, oLiteral, oDatatype)
> SELECT CONCAT("_:", __data.relation, __name.skolem),
> CONCAT(__data.relation, "#", __data.attr),
> CONCAT("\"", __data.value, "\""),
> CONCAT("xsd:", __attrs.type)
> FROM __data, __attrs, (
> SELECT __data.relation, __data.row,
> GROUP_CONCAT(__data.value SEPARATOR "~") AS skolem
> FROM __data GROUP BY relation, row ) AS __name
> WHERE __name.relation=__data.relation
> AND __name.row=__data.row
> AND __data.relation=__attrs.relation
> AND __data.attr=__attrs.attr;
> ©°©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤
> ©¤©¤©´
> ©¦ sIRI ©¦ sBNode ©¦ pIRI ©¦ oIRI ©¦ oBNode ©¦ oLiteral ©¦ oDatatype ©¦
> oLang ©¦
> ©¦ _:foo1~1.0~z1 ©¦ NULL ©¦ foo#X ©¦ NULL ©¦ NULL ©¦ "1" ©¦ xsd:INT ©¦ NULL
> ©¦
> ©¦ _:foo1~1.0~z1 ©¦ NULL ©¦ foo#Y ©¦ NULL ©¦ NULL ©¦ "1.0" ©¦ xsd:FLOAT ©¦ NULL
> ©¦
> ©¦ _:foo1~1.0~z1 ©¦ NULL ©¦ foo#Z ©¦ NULL ©¦ NULL ©¦ "z1" ©¦ xsd:BLOB ©¦ NULL
> ©¦
> ©¦ _:foo2~2.0~z2 ©¦ NULL ©¦ foo#X ©¦ NULL ©¦ NULL ©¦ "2" ©¦ xsd:INT ©¦ NULL
> ©¦
> ©¦ _:foo2~2.0~z2 ©¦ NULL ©¦ foo#Y ©¦ NULL ©¦ NULL ©¦ "2.0" ©¦ xsd:FLOAT ©¦ NULL
> ©¦
> ©¦ _:foo2~2.0~z2 ©¦ NULL ©¦ foo#Z ©¦ NULL ©¦ NULL ©¦ "z2" ©¦ xsd:BLOB ©¦ NULL
> ©¦
> ©¸©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤
> ©¤©¤©¼
>
> which would give you
>
> _:Debts~juan~50 <Debts#Name> "juan"^^xsd:STRING .
> _:Debts~juan~50 <Debts#Amount> "50"^^xsd:INT .
> twice (which would collapse to once), instead of the more faithful
> _:Debts1 <Debts#Name> "juan"^^xsd:STRING .
> _:Debts1 <Debts#Amount> "50"^^xsd:INT .
> _:Debts2 <Debts#Name> "juan"^^xsd:STRING .
> _:Debts2 <Debts#Amount> "50"^^xsd:INT .
>
>
> Import: getting cardinality right matters. Relational heads like to
> work over sets, but ultimately, the practical needs of not having to
> guarantee uniqueness always win. The set-based relational engines of
> the early 80s all either changed to multi-set (changing many
> customers' queries) or folded. By 86, SQL established a multi-set
> standard which maintained cardinality even when uniqueness was
> projected away. At the end of 2006, Fred Zemke interceded to keep the
> SPARQL semantics from having ambiguous cardinality, which cost months
> but gave us invaluable definition in the semantics.
>
>> The Denotational Semantics for the Direct Mapping (aka. the algebras and
>> the associated mapping function) *already* works on top of multisets
>> (and sets and lists), both in the data model for RDB and in the
>> higher-order functions we define/use to manipulate this data-structure.
>>
>> Alexandre.
>>
>>>
>>>
>>> Boris
>>>
>>>
>>>
>>>
>>
>>
>>
>
> --
> -ericP
>
Received on Wednesday, 19 January 2011 16:44:28 UTC