- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Sun, 18 Jul 2010 17:31:38 -0400
- To: Juan Sequeda <juanfederico@gmail.com>
- Cc: Harry Halpin <hhalpin@w3.org>, public-rdb2rdf-wg@w3.org
* Juan Sequeda <juanfederico@gmail.com> [2010-07-18 12:38-0500] > On Sun, Jul 18, 2010 at 12:23 PM, Harry Halpin <hhalpin@w3.org> wrote: > > > > Harry, > > > > > > On Sun, Jul 18, 2010 at 8:26 AM, Harry Halpin <hhalpin@w3.org> wrote: > > > > > >> While I enjoyed the talk last week, I was wondering about the > > >> relationship > > >> between Eric's proposed direct mapping [1] and the rules put forward > > >> last > > >> week by Marcelo [2]. This question goes to both, and the entire working > > >> group. > > >> > > >> One of the advantages of Eric's default mapping mechanism [1] is that it > > >> allows relational data to be expressed in RDF without the author of the > > >> mapping knowing *any* rules or having any ontology that he or she wants > > >> to > > >> map their relational data to. > > >> > > > > > > This is exactly the same as the Database-Instance-Only mapping. > > > > Are we sure? Eric - thoughts? The main goal of http://www.w3.org/2001/sw/rdb2rdf/directGraph/ was to precisely define the default graph. If the document is successful, an implementor should be able to take a relational database and a stem URI and create a (virtual) direct graph. > > There's at least two differences I see. Syntactically, ericP is not > > generating any new predicate URIs (foaf:name), thus his insistence on > > creating a "stem graph" with default URIs. I imagine this will just be a > > simple option, with the generateURIs being created by a call to some > > standardized interface to the Linked Data Web via a search engine like > > Sindice, a vocabulary management service, or something like OKKAM. > > > > > I think this is an issue of the syntax. A predicate needs to be created. > This is the semantics. How it's going to be done is another issue. > > The second difference is how Eric decided to express his semantics, i.e. > > using sets rather than Datalog-ish rules that resemble FOL. I went over > > Eric's work only once, but I believe we need to make a decision as a > > Working Group to pick one style of doing semantics and stick with it in > > the spec, even though they are technically equivalent, i.e. we should > > choose between set-theoretic model theory or just a mapping to > > FOL/Datalog/RIF semantics with a standard interpretation. > > > > Honestly, I have trouble understanding the semantics that Eric has written. > > I would recommend using Datalog because > > 1) it has well defined semantics > 2) it can be translated to RIF > 3) it can be translated to SQL I eventually picked set semantics because of the success of "Semantics and Complexity of SPARQL" Pérez, Arenas, and Gutierrez http://arxiv.org/pdf/cs.DB/0605124 This is a good opportunity for me to proof-read and provide an English reading, using the definitions in the Notation section: [1] Database ≝ { RelName → Relation } Database is a mapping from relation name to relation. [2] Relation ≝ ( Header, PrimaryKey, ForeignKeys, Body ) Relation is a tuple of a header, primary key, foreign keys and body. [3] Header ≝ { AttrName → SQLDatatype } Header is a mapping from attribute name to SQL datatype. [4] PrimaryKey ≝ [ AttrName ] PrimaryKey is a list of attribute names. [5] ForeignKeys ≝ { AttrName → ( Relation, AttrName ) } ForeignKeys is a mapping from attribute name to tuples of relation and attribute name. [6] SQLDatatype ≝ { INT | FLOAT | CHARn } SQLDatatype is, for now, an INT, FLOAT or CHARn (e.g. CHAR(40)). [7] Body ≝ [ Tuple ] Body is a list of tuples (note list, SQL semantics, not set, relational). [8] Tuple ≝ { AttrName → CellValue } Tuple is a mapping from attribute name to cell value [9] CellValue ≝ value | Null CellValue is a some value or Null (à la SQL). 4.2 RDF Model Definition (Normative) [10] Graph ≝ { Triple } An RDF graph is a set of triples. [11] Triple ≝ ( Subject, Predicate, Object ) A triple is a tuple of subject, predicate, object. [12] Subject ≝ IRI ⊔ BlankNode A subject is a IRI (disjoint) or a blank node. [13] Predicate ≝ IRI A predicate is an IRI. [14] Object ≝ IRI ⊔ BlankNode ⊔ Literal An object is a IRI or a blank node or a literal. [15] IRI ≝ RDF URI-reference as subsequently restricted by SPARQL. An IRI is defined by RDF and restricted (to exclude spaces) by SPARQL. [16] BlankNode ≝ RDF blank node. A blank node is defined by RDF. [17] Literal ≝ (lexicalValue, IRI) per RDF literal. A literal is a tuple of a lexical value and an IRI, per RDF. 5 Direct Mapping Definition (Normative) Now the definitions for the Direct Mapping (how you produce a direct graph from any database and stem IRI), which is defined for relations with a single primary key. [18] pk(R) ≝ A ∣ first A ∈ R.PrimaryKey The primary key of R is an attribute such that it is the first (sole) attribute element of the primary key. [19] reference(T) ≝ { A in T ∣ A ∈ R.ForeignKeys } A tuple's reference attributes are the set of attributes A such that A is an element of the relations's foreign keys. [20] scalar(T) ≝ { A in T ∣ A not-Null ∧ A ∉ pk(R) ∧ A ∉ reference(T) } A tuple's scalar attributes are the attributes in T which are not null, not in the pk and not reference attributes. The direct* functions make tuples T in a relation R in a db to an RDF graph. [21] directDB(db) ≝ { directR(r) ∣ r ∈ db } directDB of a DB is the set of directR for each r in the db. [22] directR(R) ≝ { directT(R, T) ∣ T ∈ R.Body } directR is the set of directT for each tuple T in R's body. [23] directT(R, T) ≝ { directL(R, S, A) ∣ A ∈ scalar(T) } ∪ { directN(R, S, A) ∣ A ∈ reference(T) } ∣ S = nodemap(R, pk(T)) directT calculates a common subject S and produces a set of triples from the scalar and reference attributes. [24] directL(R, S, A) ≝ triple(S, predicatemap(R, A), literalmap(A)) A direct triple for a scalar attribute is the common subject, the predicate map of the relation and attribute, and the literalmap of the attribute. [25] directN(R, S, A) ≝ triple(S, predicatemap(R, A), nodemap(R, A)) A direct triple for a reference attribute is S, a predicate map, and the node map of the attribute. 5.1 Linked-data-friendly [26] nodemap(R, A) ≝ hash-nodemap(R, A) | slash-nodemap(R, A) [27] predicatemap(R, A) ≝ hash-predicatemap(R, A) | slash-predicatemap(R, A) The definitions of predicatemap and nodemap are consistent with hash or slash flavors of linked data. [28] hash-nodemap(R, A) ≝ IRI(stem + "/" + R.name "/" A.name + "." + A.value + "#_") CONCAT(stem, "/" + R.name "/" A.name + "." + A.value + "#_"). [29] hash-predicatemap(R, A) ≝ IRI(stem + "/" + R.name "#" A.name) etc. [30] slash-nodemap(R, A) ≝ IRI(stem + "/" + R.name "/" A.name + "." + A.value) [31] slash-predicatemap(R, A) ≝ IRI(stem + "/" + R.name "/" A.name) 5.2 W3C XML Schema Datatypes literalmap produces RDF literal with XSD datatypes with this type mapping TM: [32] literalmap(A) ≝ Literal(A[V], SQL2XSD[A]) ∣ SQL2XSD is the mapping from SQL datatypes to XML datatypes below: SQL XSD INT http://www.w3.org/TR/xmlschema-2/#integer FLOAT http://www.w3.org/TR/xmlschema-2/#float DATE http://www.w3.org/TR/xmlschema-2/#date TIME http://www.w3.org/TR/xmlschema-2/#time TIMESTAMP http://www.w3.org/TR/xmlschema-2/#dateTime CHAR http://www.w3.org/TR/xmlschema-2/#string VARCHAR http://www.w3.org/TR/xmlschema-2/#string STRING http://www.w3.org/TR/xmlschema-2/#string A literalmap produces an tuple of value and datatype, consistent with RDF. 6 Extending the Direct Mapping Follow are some recipes to extend the direct mapping, specifically replacing some production numbers in the direct mapping. 6.1 Direct Mapping with Primary Keys (Normative) [20-pk] scalar(T) ≝ { A in T ∣ A not-Null ∧ A ∉ reference(T) } The DM-PK graph replaces production 20, removing A ∉ pk(R) from the definition of the scalar function in order to not exclude primary keys. 6.2 Type Annotations (Normative) [23-type] directT(R, T) ≝ { directL(R, S, A) ∣ A ∈ scalar(T) } ∪ { directN(R, S, A) ∣ A ∈ reference(T) } ∪ { directP(R, S) } ∣ S = nodemap(R, pk(T)) The DM-type graph adds a type arc calculated from the name of R. [33-type] directP(R, S) ≝ triple(S, rdf:type, typemap(R)) [34-type] typemap(R) ≝ IRI(stem + "#" + R.name) 6.3 Many to Many Mappings (Normative) [32-m2m] manytomany(R) ≝ { R ∣ R has exactly two attributes (X, Y), being foreign keys to RX.PKX and RY.PKY respectively } Add a test for manytomany relations test. [21-m2m] directDB(db) ≝ { directR(r) ∣ r ∉ manytomany(db.R) } ∪ { repeatpropertyR(r) ∣ r ∈ manytomany(db.R) } Exclude manytomany relations from the calls to directR; instead call repeatproperyR. [33-m2m] repeatpropertyR(R) ≝ { repeatpropertyT(R, T) ∣ T ∈ R.Body } For each tuple in an R (with attribute X a foreign key to RX.PKY and attribute Y a foreign key to RY.PKY) [34-m2m] repeatpropertyT(R, T) ≝ triple(nodemap(RX, PKX), predicatemap(R, Y), nodemap(RY, PKY)) Emit a triple like triple(IRI(stem + "/" + RX.name "/" PKX.name + "." + PKX.value + "#_"), IRI(stem + "/" + R.name "/" Y.name + "." + PKX.value + "#_"), IRI(stem + "/" + RY.name "/" PKY.name + "." + PKY.value + "#_")) Would non-normative explanations like this be useful in the spec? We could use a notation to indicate they aren't intended to be precise and complete, just informative. > > It would be kind of odd to switch styles of semantics. > > > > > > > >> > > >> This is one of the requirements of our charter, although of course we > > >> want mappings to other vocabularies to be possible. Remember, this can > > >> be > > >> thought of as a two-step process, where the first step is a default > > >> mapping, and then later mappigs (via Datalog rules, RIF, SQL or > > >> whatever) > > >> could then transform > > >> > > > > > > In this simple approach, the predicates are the only things that are > > going > > > to be mapped: > > > > > > ex:name ->foaf:name > > > .... > > > > > > So you could have a system that can automatically generate: > > > > > > Triple(s, "ex:name", name) <- student(s_id, name), generateURI(s_id, s) > > > > > > or the user can write the mapping with the : > > > > > > Triple(s, "foaf:name", name) <- student(s_id, name), generateURI(s_id, s) > > > > > > > > >> Could we take the rules given earlier [2] and then use these to produce > > >> the same effects as Eric's direct mapping proposal? Could someone > > >> specify > > >> this in detail? > > >> > > >> > > > The Database-Instance-Only mapping does that. > > > > > > > > >> Then the default mapping could be seen as a certain default application > > >> of > > >> rules, an application that *can* be changed. > > >> > > > > > > The rules defines the semantics of what needs to be implemented in an > > > application > > > > > > > > >> > > >> cheers, > > >> harry > > >> > > >> [1] http://www.w3.org/2001/sw/rdb2rdf/directGraph/ > > >> [2]http://web.ing.puc.cl/~marenas/W3C/mapping_language.txt > > >> > > >> > > >> > > >> > > > > > > > -- -ericP
Received on Sunday, 18 July 2010 21:32:16 UTC