- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Wed, 19 Jan 2011 11:29:34 -0500
- To: Alexandre Bertails <bertails@w3.org>
- Cc: bvillazon@fi.upm.es, Harry Halpin <hhalpin@w3.org>, RDB2RDF WG <public-rdb2rdf-wg@w3.org>
* Alexandre Bertails <bertails@w3.org> [2011-01-19 09:07-0500] > On Wed, 2011-01-19 at 14:03 +0100, Boris Villazón Terrazas wrote: > > On 19/01/2011 13:22, Harry Halpin wrote: > > >> On Jan 18, 2011, at 19:05 , Alexandre Bertails wrote: > > >> [snip] > > >>> Juan, in order to help you with your action, I would like you to > > >>> considerer the following: > > >>> > > >>> [[ > > >>> CREATE TABLE Debts ( > > >>> Name varchar(50), > > >>> Amount Integer > > >>> ); > > >>> INSERT INTO Debts (Name, Amount) VALUES("juan", 50); > > >>> INSERT INTO Debts (Name, Amount) VALUES("juan", 50); > > >>> ]] > > >>> > > >>> Using this very simple RDB [1] example, can you go through all the > > >>> phases that lead to the RDF where I owe you 100? > > >> > > >> Alex, for my understanding: what I would expect to see in RDF are two > > >> pairs of identical triples with different subjects. How would a direct > > >> produce anything whereby I owe Juan 100? I will owe 50 twice, but the fact > > >> that this is a hundred is a step that the RDF per se cannot say... > > > Otherwise known as "RDF can't add, and (up until recently) neither can > > > SPARQL". Which is a feature, not a bug, re decidability :) > > > > > > However, I'm assuming Alex is asking with Datalog semantics, do you get > > > with those two insert statements *two* distinct pairs of triples > > > (multiset) or just one (set)? > > > > > > Multisets have repeated membership, while sets don't I think. > > Thanks Alex for pointing out this case. > > This case was included by Eric within the test cases > > http://www.w3.org/2001/sw/rdb2rdf/wiki/R2RML_Test_Cases_v1#DB5 > > You're right. I just made simpler and put it in the body of the email so > I was sure that people would look at it :-) > > > So, as you said the question is what is the default mapping for this case? > > Two triples with different bnodes as subjects. > > > how we can face this case with R2RML? and what would be the expected result? > > I have no idea. I think "facing the case" means making sure the cardinality of SPARQL result set matches that of the intuitive SQL result table. For instance, SELECT ?to ?amnt { ?debt Debts:Name ?to ; Debts:Amount ?amnt } should give back two rows when SELECT Debts.Name, Debts.Amount FROM Debts would give two rows. Likewise SELECT SUM(?amnt) { ?debt :Amount ?amnt } should be the same as SELECT SUM(Debts.amnt) FROM Debts . This is handled in the direct mapping by creating a blank node for each row in each table which does not have a primary key. You could handle this in Datalog by ammending http://www.w3.org/mid/20101116000533.GI13630@w3.org to include an SQL sequence generator (à la PRIMARY KEY AUTO INCREMENT) in a copy of each of the database tables which has no primary key. Tweaking from an earlier proposal: * Eric Prud'hommeaux <eric@w3.org> [2010-11-15 19:05-0500] > The current text assumes the trivial mapping from an SQL table to > datalog facts. Eliding column names, we see: > > ┌┤foo├────┬────┐ > │ X │ Y │ Z │ > │ x1 │ y1 │ z1 │ => foo(x1, y1, z1) > │ x2 │ y2 │ z2 │ foo(x2, y2, z2) > └────┴────┴────┘ > > If we use, yes, HOL, to create our datalog facts, we can capture > things like the relation name, attribute (column) names, even the > datatypes and keys. (This effectively recapitulates the catalog.) > > ┌┤foo.schema├─────┬────┬───────┬────┬──────┐ > │ name │ a1 │ t1 │ a2 │ t2 │ a3 │ t3 │ > │ foo │ X │ int │ Y │ float │ Z │ blob │ > └──────┴────┴─────┴────┴───────┴────┴──────┘ Further reflection suggests a vertical encoding like: ┌┤__attrs├────────┬───────┐ │ relation │ attr │ type │ │ foo │ X │ int │ │ foo │ Y │ float │ │ foo │ Z │ blob │ └──────────┴──────┴───────┘ > ┌┤foo.data├─┬────┬────┐ > │ name │ a1 │ a2 │ a3 │ > │ foo │ x1 │ y1 │ z1 │ > │ foo │ x2 │ y2 │ z2 │ > └──────┴────┴────┴────┘ ┌┤__data├──┬─────┬──────┬───────┐ │ relation │ row │ attr │ value │ │ foo │ 1 │ X │ "1" │ │ foo │ 1 │ Y │ "1.0" │ │ foo │ 1 │ Z │ "z1" │ │ foo │ 2 │ X │ "2" │ │ foo │ 2 │ Y │ "2.0" │ │ foo │ 2 │ Z │ "z2" │ └──────────┴─────┴──────┴───────┘ > From there, you have all of your column names, relation names, etc. to > use FOL to generate the IRIs and literals for the RDF relation. But, > if you're actually working in a type-safe FOL, you can encode your RDF > in a relation that has types (beyond strings): > > ┌┤foo.graph├┬────────┬─────────┬──────┬────────┬──────────┬──────────────┬──────────┐ > │ sIRI │ sBNode │ p │ oIRI │ oBNode │ oLexical │ oDatype │ oLangTag │ > │ http…X=x1 │ NULL │ http…#X │ NULL │ NULL │ x1 │ http…integer │ │ > │ http…X=x1 │ NULL │ http…#Y │ NULL │ NULL │ y1 │ http…string │ │ > │ http…X=x1 │ NULL │ http…#Z │ NULL │ NULL │ z1 │ http…float │ │ > └───────────┴────────┴─────────┴──────┴────────┴──────────┴──────────────┴──────────┘ I coded this transformation in SQL so folks can play along with the tools they have at home: -- Create the sample database (one table): CREATE TABLE foo (X INT, Y FLOAT, Z BLOB); INSERT INTO foo (X, Y, Z) VALUES (1, 1.0, "z1"), (2, 2.0, "z2"); # Higher-order import of table and attribute names into domain of discourse: CREATE TABLE __attrs (relation VARCHAR(20), attr VARCHAR(20), type VARCHAR(20)); INSERT INTO __attrs (relation, attr, type) VALUES ("foo", "X", "INT"), ("foo", "Y", "FLOAT"), ("foo", "Z", "BLOB"); CREATE TABLE __data (relation VARCHAR(20), row INT, attr VARCHAR(20), value VARCHAR(20)); INSERT INTO __data (relation, row, attr, value) VALUES ("foo", 1, "X", "1"), ("foo", 1, "Y", "1.0"), ("foo", 1, "Z", "z1"), ("foo", 2, "X", "2"), ("foo", 2, "Y", "2.0"), ("foo", 2, "Z", "z2"); -- We could capture the keys info in a __keys table, but it's -- not necessary for this example. -- Representation of RDF graph: CREATE TABLE __graph (sIRI VARCHAR(30), sBNode VARCHAR(30), pIRI VARCHAR(30), oIRI VARCHAR(30), oBNode VARCHAR(30), oLiteral VARCHAR(30), oDatatype VARCHAR(30), oLang VARCHAR(10)); -- insert triples like (BNode, IRI, Literal)s INSERT INTO __graph (sIRI, pIRI, oLiteral, oDatatype) SELECT CONCAT("_:", __data.relation, __data.row), CONCAT(__data.relation, "#", __data.attr), CONCAT("\"", __data.value, "\""), CONCAT("xsd:", __attrs.type) FROM __data, __attrs WHERE __data.relation=__attrs.relation AND __data.attr=__attrs.attr; -- Et voila, the literal triples (with a BNode subject): SELECT * FROM __graph; ┌────────┬────────┬───────┬──────┬────────┬──────────┬───────────┬───────┐ │ sIRI │ sBNode │ pIRI │ oIRI │ oBNode │ oLiteral │ oDatatype │ oLang │ │ _:foo1 │ NULL │ foo#X │ NULL │ NULL │ "1" │ xsd:INT │ NULL │ │ _:foo1 │ NULL │ foo#Y │ NULL │ NULL │ "1.0" │ xsd:FLOAT │ NULL │ │ _:foo1 │ NULL │ foo#Z │ NULL │ NULL │ "z1" │ xsd:BLOB │ NULL │ │ _:foo2 │ NULL │ foo#X │ NULL │ NULL │ "2" │ xsd:INT │ NULL │ │ _:foo2 │ NULL │ foo#Y │ NULL │ NULL │ "2.0" │ xsd:FLOAT │ NULL │ │ _:foo2 │ NULL │ foo#Z │ NULL │ NULL │ "z2" │ xsd:BLOB │ NULL │ └────────┴────────┴───────┴──────┴────────┴──────────┴───────────┴───────┘ The skolemizing apprach in the direct mapping wiki would look more like: INSERT INTO __graph (sIRI, pIRI, oLiteral, oDatatype) SELECT CONCAT("_:", __data.relation, __name.skolem), CONCAT(__data.relation, "#", __data.attr), CONCAT("\"", __data.value, "\""), CONCAT("xsd:", __attrs.type) FROM __data, __attrs, ( SELECT __data.relation, __data.row, GROUP_CONCAT(__data.value SEPARATOR "~") AS skolem FROM __data GROUP BY relation, row ) AS __name WHERE __name.relation=__data.relation AND __name.row=__data.row AND __data.relation=__attrs.relation AND __data.attr=__attrs.attr; ┌───────────────┬────────┬───────┬──────┬────────┬──────────┬───────────┬───────┐ │ sIRI │ sBNode │ pIRI │ oIRI │ oBNode │ oLiteral │ oDatatype │ oLang │ │ _:foo1~1.0~z1 │ NULL │ foo#X │ NULL │ NULL │ "1" │ xsd:INT │ NULL │ │ _:foo1~1.0~z1 │ NULL │ foo#Y │ NULL │ NULL │ "1.0" │ xsd:FLOAT │ NULL │ │ _:foo1~1.0~z1 │ NULL │ foo#Z │ NULL │ NULL │ "z1" │ xsd:BLOB │ NULL │ │ _:foo2~2.0~z2 │ NULL │ foo#X │ NULL │ NULL │ "2" │ xsd:INT │ NULL │ │ _:foo2~2.0~z2 │ NULL │ foo#Y │ NULL │ NULL │ "2.0" │ xsd:FLOAT │ NULL │ │ _:foo2~2.0~z2 │ NULL │ foo#Z │ NULL │ NULL │ "z2" │ xsd:BLOB │ NULL │ └───────────────┴────────┴───────┴──────┴────────┴──────────┴───────────┴───────┘ which would give you _:Debts~juan~50 <Debts#Name> "juan"^^xsd:STRING . _:Debts~juan~50 <Debts#Amount> "50"^^xsd:INT . twice (which would collapse to once), instead of the more faithful _:Debts1 <Debts#Name> "juan"^^xsd:STRING . _:Debts1 <Debts#Amount> "50"^^xsd:INT . _:Debts2 <Debts#Name> "juan"^^xsd:STRING . _:Debts2 <Debts#Amount> "50"^^xsd:INT . Import: getting cardinality right matters. Relational heads like to work over sets, but ultimately, the practical needs of not having to guarantee uniqueness always win. The set-based relational engines of the early 80s all either changed to multi-set (changing many customers' queries) or folded. By 86, SQL established a multi-set standard which maintained cardinality even when uniqueness was projected away. At the end of 2006, Fred Zemke interceded to keep the SPARQL semantics from having ambiguous cardinality, which cost months but gave us invaluable definition in the semantics. > The Denotational Semantics for the Direct Mapping (aka. the algebras and > the associated mapping function) *already* works on top of multisets > (and sets and lists), both in the data model for RDB and in the > higher-order functions we define/use to manipulate this data-structure. > > Alexandre. > > > > > > > Boris > > > > > > > > > > > -- -ericP
Received on Wednesday, 19 January 2011 16:30:19 UTC