- From: Michael Hausenblas <michael.hausenblas@deri.org>
- Date: Wed, 19 Jan 2011 16:42:49 +0000
- To: Juan Sequeda <juanfederico@gmail.com>, Alexandre Bertails <bertails@w3.org>, Eric Prud'hommeaux <eric@w3.org>
- CC: RDB2RDF WG <public-rdb2rdf-wg@w3.org>
With my chair hat on: I appreciate all your enthusiasm very much, but I would appreciate it even more if this discussion would be done within the context of an issue. Tracker, this is ACTION-95 on Juan. Cheers, Michael -- Dr. Michael Hausenblas, Research Fellow LiDRC - Linked Data Research Centre DERI - Digital Enterprise Research Institute NUIG - National University of Ireland, Galway Ireland, Europe Tel. +353 91 495730 http://linkeddata.deri.ie/ http://sw-app.org/about.html > From: Eric Prud'hommeaux <eric@w3.org> > Organization: World Wide Web Consortium (W3C) - http://www.w3.org/ > Date: Wed, 19 Jan 2011 11:29:34 -0500 > To: Alexandre Bertails <bertails@w3.org> > Cc: <bvillazon@fi.upm.es>, Harry Halpin <hhalpin@w3.org>, RDB2RDF WG > <public-rdb2rdf-wg@w3.org> > Subject: example SQL mapping to direct graph > Resent-From: RDB2RDF WG <public-rdb2rdf-wg@w3.org> > Resent-Date: Wed, 19 Jan 2011 16:31:21 +0000 > > * Alexandre Bertails <bertails@w3.org> [2011-01-19 09:07-0500] >> On Wed, 2011-01-19 at 14:03 +0100, Boris Villaz¨®n Terrazas wrote: >>> On 19/01/2011 13:22, Harry Halpin wrote: >>>>> On Jan 18, 2011, at 19:05 , Alexandre Bertails wrote: >>>>> [snip] >>>>>> Juan, in order to help you with your action, I would like you to >>>>>> considerer the following: >>>>>> >>>>>> [[ >>>>>> CREATE TABLE Debts ( >>>>>> Name varchar(50), >>>>>> Amount Integer >>>>>> ); >>>>>> INSERT INTO Debts (Name, Amount) VALUES("juan", 50); >>>>>> INSERT INTO Debts (Name, Amount) VALUES("juan", 50); >>>>>> ]] >>>>>> >>>>>> Using this very simple RDB [1] example, can you go through all the >>>>>> phases that lead to the RDF where I owe you 100? >>>>> >>>>> Alex, for my understanding: what I would expect to see in RDF are two >>>>> pairs of identical triples with different subjects. How would a direct >>>>> produce anything whereby I owe Juan 100? I will owe 50 twice, but the fact >>>>> that this is a hundred is a step that the RDF per se cannot say... >>>> Otherwise known as "RDF can't add, and (up until recently) neither can >>>> SPARQL". Which is a feature, not a bug, re decidability :) >>>> >>>> However, I'm assuming Alex is asking with Datalog semantics, do you get >>>> with those two insert statements *two* distinct pairs of triples >>>> (multiset) or just one (set)? >>>> >>>> Multisets have repeated membership, while sets don't I think. >>> Thanks Alex for pointing out this case. >>> This case was included by Eric within the test cases >>> http://www.w3.org/2001/sw/rdb2rdf/wiki/R2RML_Test_Cases_v1#DB5 >> >> You're right. I just made simpler and put it in the body of the email so >> I was sure that people would look at it :-) >> >>> So, as you said the question is what is the default mapping for this case? >> >> Two triples with different bnodes as subjects. >> >>> how we can face this case with R2RML? and what would be the expected result? >> >> I have no idea. > > I think "facing the case" means making sure the cardinality of SPARQL > result set matches that of the intuitive SQL result table. For instance, > SELECT ?to ?amnt { ?debt Debts:Name ?to ; Debts:Amount ?amnt } > should give back two rows when > SELECT Debts.Name, Debts.Amount FROM Debts > would give two rows. Likewise > SELECT SUM(?amnt) { ?debt :Amount ?amnt } > should be the same as > SELECT SUM(Debts.amnt) FROM Debts > . This is handled in the direct mapping by creating a blank node for > each row in each table which does not have a primary key. You could > handle this in Datalog by ammending > http://www.w3.org/mid/20101116000533.GI13630@w3.org > to include an SQL sequence generator (¨¤ la PRIMARY KEY AUTO > INCREMENT) in a copy of each of the database tables which has no > primary key. Tweaking from an earlier proposal: > > * Eric Prud'hommeaux <eric@w3.org> [2010-11-15 19:05-0500] >> The current text assumes the trivial mapping from an SQL table to >> datalog facts. Eliding column names, we see: >> >> ©°©Èfoo©À©¤©¤©¤©¤©Ð©¤©¤©¤©¤©´ >> ©¦ X ©¦ Y ©¦ Z ©¦ >> ©¦ x1 ©¦ y1 ©¦ z1 ©¦ => foo(x1, y1, z1) >> ©¦ x2 ©¦ y2 ©¦ z2 ©¦ foo(x2, y2, z2) >> ©¸©¤©¤©¤©¤©Ø©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¼ >> >> If we use, yes, HOL, to create our datalog facts, we can capture >> things like the relation name, attribute (column) names, even the >> datatypes and keys. (This effectively recapitulates the catalog.) >> >> ©°©Èfoo.schema©À©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©´ >> ©¦ name ©¦ a1 ©¦ t1 ©¦ a2 ©¦ t2 ©¦ a3 ©¦ t3 ©¦ >> ©¦ foo ©¦ X ©¦ int ©¦ Y ©¦ float ©¦ Z ©¦ blob ©¦ >> ©¸©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¼ > > Further reflection suggests a vertical encoding like: > > ©°©È__attrs©À©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©´ > ©¦ relation ©¦ attr ©¦ type ©¦ > ©¦ foo ©¦ X ©¦ int ©¦ > ©¦ foo ©¦ Y ©¦ float ©¦ > ©¦ foo ©¦ Z ©¦ blob ©¦ > ©¸©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¼ > >> ©°©Èfoo.data©À©¤©Ð©¤©¤©¤©¤©Ð©¤©¤©¤©¤©´ >> ©¦ name ©¦ a1 ©¦ a2 ©¦ a3 ©¦ >> ©¦ foo ©¦ x1 ©¦ y1 ©¦ z1 ©¦ >> ©¦ foo ©¦ x2 ©¦ y2 ©¦ z2 ©¦ >> ©¸©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©Ø©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¼ > > ©°©È__data©À©¤©¤©Ð©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©´ > ©¦ relation ©¦ row ©¦ attr ©¦ value ©¦ > ©¦ foo ©¦ 1 ©¦ X ©¦ "1" ©¦ > ©¦ foo ©¦ 1 ©¦ Y ©¦ "1.0" ©¦ > ©¦ foo ©¦ 1 ©¦ Z ©¦ "z1" ©¦ > ©¦ foo ©¦ 2 ©¦ X ©¦ "2" ©¦ > ©¦ foo ©¦ 2 ©¦ Y ©¦ "2.0" ©¦ > ©¦ foo ©¦ 2 ©¦ Z ©¦ "z2" ©¦ > ©¸©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¼ > > >> From there, you have all of your column names, relation names, etc. to >> use FOL to generate the IRIs and literals for the RDF relation. But, >> if you're actually working in a type-safe FOL, you can encode your RDF >> in a relation that has types (beyond strings): >> >> >> ©°©Èfoo.graph©À©Ð©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤ >> ©¤©¤©¤©¤©¤©¤©¤©´ >> ©¦ sIRI ©¦ sBNode ©¦ p ©¦ oIRI ©¦ oBNode ©¦ oLexical ©¦ oDatype ©¦ >> oLangTag ©¦ >> ©¦ http¡X=x1 ©¦ NULL ©¦ http¡#X ©¦ NULL ©¦ NULL ©¦ x1 ©¦ http¡integer ©¦ >> ©¦ >> ©¦ http¡X=x1 ©¦ NULL ©¦ http¡#Y ©¦ NULL ©¦ NULL ©¦ y1 ©¦ http¡string ©¦ >> ©¦ >> ©¦ http¡X=x1 ©¦ NULL ©¦ http¡#Z ©¦ NULL ©¦ NULL ©¦ z1 ©¦ http¡float ©¦ >> ©¦ >> >> ©¸©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤ >> ©¤©¤©¤©¤©¤©¤©¤©¼ > > I coded this transformation in SQL so folks can play along with the > tools they have at home: > > -- Create the sample database (one table): > CREATE TABLE foo (X INT, > Y FLOAT, > Z BLOB); > INSERT INTO foo (X, Y, Z) VALUES (1, 1.0, "z1"), > (2, 2.0, "z2"); > > # Higher-order import of table and attribute names into domain of discourse: > CREATE TABLE __attrs (relation VARCHAR(20), > attr VARCHAR(20), > type VARCHAR(20)); > INSERT INTO __attrs (relation, attr, type) VALUES ("foo", "X", "INT"), > ("foo", "Y", "FLOAT"), > ("foo", "Z", "BLOB"); > CREATE TABLE __data (relation VARCHAR(20), > row INT, > attr VARCHAR(20), > value VARCHAR(20)); > INSERT INTO __data (relation, row, attr, value) VALUES ("foo", 1, "X", "1"), > ("foo", 1, "Y", "1.0"), > ("foo", 1, "Z", "z1"), > ("foo", 2, "X", "2"), > ("foo", 2, "Y", "2.0"), > ("foo", 2, "Z", "z2"); > -- We could capture the keys info in a __keys table, but it's > -- not necessary for this example. > > -- Representation of RDF graph: > CREATE TABLE __graph (sIRI VARCHAR(30), > sBNode VARCHAR(30), > pIRI VARCHAR(30), > oIRI VARCHAR(30), > oBNode VARCHAR(30), > oLiteral VARCHAR(30), > oDatatype VARCHAR(30), > oLang VARCHAR(10)); > > -- insert triples like (BNode, IRI, Literal)s > INSERT INTO __graph (sIRI, pIRI, oLiteral, oDatatype) > SELECT CONCAT("_:", __data.relation, __data.row), > CONCAT(__data.relation, "#", __data.attr), > CONCAT("\"", __data.value, "\""), > CONCAT("xsd:", __attrs.type) > FROM __data, __attrs > WHERE __data.relation=__attrs.relation > AND __data.attr=__attrs.attr; > -- Et voila, the literal triples (with a BNode subject): > SELECT * FROM __graph; > ©°©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©´ > ©¦ sIRI ©¦ sBNode ©¦ pIRI ©¦ oIRI ©¦ oBNode ©¦ oLiteral ©¦ oDatatype ©¦ oLang ©¦ > ©¦ _:foo1 ©¦ NULL ©¦ foo#X ©¦ NULL ©¦ NULL ©¦ "1" ©¦ xsd:INT ©¦ NULL ©¦ > ©¦ _:foo1 ©¦ NULL ©¦ foo#Y ©¦ NULL ©¦ NULL ©¦ "1.0" ©¦ xsd:FLOAT ©¦ NULL ©¦ > ©¦ _:foo1 ©¦ NULL ©¦ foo#Z ©¦ NULL ©¦ NULL ©¦ "z1" ©¦ xsd:BLOB ©¦ NULL ©¦ > ©¦ _:foo2 ©¦ NULL ©¦ foo#X ©¦ NULL ©¦ NULL ©¦ "2" ©¦ xsd:INT ©¦ NULL ©¦ > ©¦ _:foo2 ©¦ NULL ©¦ foo#Y ©¦ NULL ©¦ NULL ©¦ "2.0" ©¦ xsd:FLOAT ©¦ NULL ©¦ > ©¦ _:foo2 ©¦ NULL ©¦ foo#Z ©¦ NULL ©¦ NULL ©¦ "z2" ©¦ xsd:BLOB ©¦ NULL ©¦ > ©¸©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¼ > > The skolemizing apprach in the direct mapping wiki would look more like: > > INSERT INTO __graph (sIRI, pIRI, oLiteral, oDatatype) > SELECT CONCAT("_:", __data.relation, __name.skolem), > CONCAT(__data.relation, "#", __data.attr), > CONCAT("\"", __data.value, "\""), > CONCAT("xsd:", __attrs.type) > FROM __data, __attrs, ( > SELECT __data.relation, __data.row, > GROUP_CONCAT(__data.value SEPARATOR "~") AS skolem > FROM __data GROUP BY relation, row ) AS __name > WHERE __name.relation=__data.relation > AND __name.row=__data.row > AND __data.relation=__attrs.relation > AND __data.attr=__attrs.attr; > ©°©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ð©¤©¤©¤©¤©¤ > ©¤©¤©´ > ©¦ sIRI ©¦ sBNode ©¦ pIRI ©¦ oIRI ©¦ oBNode ©¦ oLiteral ©¦ oDatatype ©¦ > oLang ©¦ > ©¦ _:foo1~1.0~z1 ©¦ NULL ©¦ foo#X ©¦ NULL ©¦ NULL ©¦ "1" ©¦ xsd:INT ©¦ NULL > ©¦ > ©¦ _:foo1~1.0~z1 ©¦ NULL ©¦ foo#Y ©¦ NULL ©¦ NULL ©¦ "1.0" ©¦ xsd:FLOAT ©¦ NULL > ©¦ > ©¦ _:foo1~1.0~z1 ©¦ NULL ©¦ foo#Z ©¦ NULL ©¦ NULL ©¦ "z1" ©¦ xsd:BLOB ©¦ NULL > ©¦ > ©¦ _:foo2~2.0~z2 ©¦ NULL ©¦ foo#X ©¦ NULL ©¦ NULL ©¦ "2" ©¦ xsd:INT ©¦ NULL > ©¦ > ©¦ _:foo2~2.0~z2 ©¦ NULL ©¦ foo#Y ©¦ NULL ©¦ NULL ©¦ "2.0" ©¦ xsd:FLOAT ©¦ NULL > ©¦ > ©¦ _:foo2~2.0~z2 ©¦ NULL ©¦ foo#Z ©¦ NULL ©¦ NULL ©¦ "z2" ©¦ xsd:BLOB ©¦ NULL > ©¦ > ©¸©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©Ø©¤©¤©¤©¤©¤ > ©¤©¤©¼ > > which would give you > > _:Debts~juan~50 <Debts#Name> "juan"^^xsd:STRING . > _:Debts~juan~50 <Debts#Amount> "50"^^xsd:INT . > twice (which would collapse to once), instead of the more faithful > _:Debts1 <Debts#Name> "juan"^^xsd:STRING . > _:Debts1 <Debts#Amount> "50"^^xsd:INT . > _:Debts2 <Debts#Name> "juan"^^xsd:STRING . > _:Debts2 <Debts#Amount> "50"^^xsd:INT . > > > Import: getting cardinality right matters. Relational heads like to > work over sets, but ultimately, the practical needs of not having to > guarantee uniqueness always win. The set-based relational engines of > the early 80s all either changed to multi-set (changing many > customers' queries) or folded. By 86, SQL established a multi-set > standard which maintained cardinality even when uniqueness was > projected away. At the end of 2006, Fred Zemke interceded to keep the > SPARQL semantics from having ambiguous cardinality, which cost months > but gave us invaluable definition in the semantics. > >> The Denotational Semantics for the Direct Mapping (aka. the algebras and >> the associated mapping function) *already* works on top of multisets >> (and sets and lists), both in the data model for RDB and in the >> higher-order functions we define/use to manipulate this data-structure. >> >> Alexandre. >> >>> >>> >>> Boris >>> >>> >>> >>> >> >> >> > > -- > -ericP >
Received on Wednesday, 19 January 2011 16:44:28 UTC