- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Sun, 18 Jul 2010 17:31:38 -0400
- To: Juan Sequeda <juanfederico@gmail.com>
- Cc: Harry Halpin <hhalpin@w3.org>, public-rdb2rdf-wg@w3.org
* Juan Sequeda <juanfederico@gmail.com> [2010-07-18 12:38-0500]
> On Sun, Jul 18, 2010 at 12:23 PM, Harry Halpin <hhalpin@w3.org> wrote:
>
> > > Harry,
> > >
> > > On Sun, Jul 18, 2010 at 8:26 AM, Harry Halpin <hhalpin@w3.org> wrote:
> > >
> > >> While I enjoyed the talk last week, I was wondering about the
> > >> relationship
> > >> between Eric's proposed direct mapping [1] and the rules put forward
> > >> last
> > >> week by Marcelo [2]. This question goes to both, and the entire working
> > >> group.
> > >>
> > >> One of the advantages of Eric's default mapping mechanism [1] is that it
> > >> allows relational data to be expressed in RDF without the author of the
> > >> mapping knowing *any* rules or having any ontology that he or she wants
> > >> to
> > >> map their relational data to.
> > >>
> > >
> > > This is exactly the same as the Database-Instance-Only mapping.
> >
> > Are we sure? Eric - thoughts?
The main goal of http://www.w3.org/2001/sw/rdb2rdf/directGraph/ was to
precisely define the default graph. If the document is successful, an
implementor should be able to take a relational database and a stem
URI and create a (virtual) direct graph.
> > There's at least two differences I see. Syntactically, ericP is not
> > generating any new predicate URIs (foaf:name), thus his insistence on
> > creating a "stem graph" with default URIs. I imagine this will just be a
> > simple option, with the generateURIs being created by a call to some
> > standardized interface to the Linked Data Web via a search engine like
> > Sindice, a vocabulary management service, or something like OKKAM.
> >
> >
> I think this is an issue of the syntax. A predicate needs to be created.
> This is the semantics. How it's going to be done is another issue.
>
> The second difference is how Eric decided to express his semantics, i.e.
> > using sets rather than Datalog-ish rules that resemble FOL. I went over
> > Eric's work only once, but I believe we need to make a decision as a
> > Working Group to pick one style of doing semantics and stick with it in
> > the spec, even though they are technically equivalent, i.e. we should
> > choose between set-theoretic model theory or just a mapping to
> > FOL/Datalog/RIF semantics with a standard interpretation.
> >
>
> Honestly, I have trouble understanding the semantics that Eric has written.
>
> I would recommend using Datalog because
>
> 1) it has well defined semantics
> 2) it can be translated to RIF
> 3) it can be translated to SQL
I eventually picked set semantics because of the success of "Semantics
and Complexity of SPARQL" Pérez, Arenas, and Gutierrez
http://arxiv.org/pdf/cs.DB/0605124
This is a good opportunity for me to proof-read and provide an English
reading, using the definitions in the Notation section:
[1] Database ≝ { RelName → Relation }
Database is a mapping from relation name to relation.
[2] Relation ≝ ( Header, PrimaryKey, ForeignKeys, Body )
Relation is a tuple of a header, primary key, foreign keys and body.
[3] Header ≝ { AttrName → SQLDatatype }
Header is a mapping from attribute name to SQL datatype.
[4] PrimaryKey ≝ [ AttrName ]
PrimaryKey is a list of attribute names.
[5] ForeignKeys ≝ { AttrName → ( Relation, AttrName ) }
ForeignKeys is a mapping from attribute name to tuples of relation and attribute name.
[6] SQLDatatype ≝ { INT | FLOAT | CHARn }
SQLDatatype is, for now, an INT, FLOAT or CHARn (e.g. CHAR(40)).
[7] Body ≝ [ Tuple ]
Body is a list of tuples (note list, SQL semantics, not set, relational).
[8] Tuple ≝ { AttrName → CellValue }
Tuple is a mapping from attribute name to cell value
[9] CellValue ≝ value | Null
CellValue is a some value or Null (à la SQL).
4.2 RDF Model Definition (Normative)
[10] Graph ≝ { Triple }
An RDF graph is a set of triples.
[11] Triple ≝ ( Subject, Predicate, Object )
A triple is a tuple of subject, predicate, object.
[12] Subject ≝ IRI ⊔ BlankNode
A subject is a IRI (disjoint) or a blank node.
[13] Predicate ≝ IRI
A predicate is an IRI.
[14] Object ≝ IRI ⊔ BlankNode ⊔ Literal
An object is a IRI or a blank node or a literal.
[15] IRI ≝ RDF URI-reference as subsequently restricted by SPARQL.
An IRI is defined by RDF and restricted (to exclude spaces) by SPARQL.
[16] BlankNode ≝ RDF blank node.
A blank node is defined by RDF.
[17] Literal ≝ (lexicalValue, IRI) per RDF literal.
A literal is a tuple of a lexical value and an IRI, per RDF.
5 Direct Mapping Definition (Normative)
Now the definitions for the Direct Mapping (how you produce a
direct graph from any database and stem IRI), which is defined
for relations with a single primary key.
[18] pk(R) ≝ A ∣ first A ∈ R.PrimaryKey
The primary key of R is an attribute such that it is the first (sole) attribute element of the primary key.
[19] reference(T) ≝ { A in T ∣ A ∈ R.ForeignKeys }
A tuple's reference attributes are the set of attributes A such that A is an element of the relations's foreign keys.
[20] scalar(T) ≝ { A in T ∣ A not-Null ∧ A ∉ pk(R) ∧ A ∉ reference(T) }
A tuple's scalar attributes are the attributes in T which are not null, not in the pk and not reference attributes.
The direct* functions make tuples T in a relation R in a db to
an RDF graph.
[21] directDB(db) ≝ { directR(r) ∣ r ∈ db }
directDB of a DB is the set of directR for each r in the db.
[22] directR(R) ≝ { directT(R, T) ∣ T ∈ R.Body }
directR is the set of directT for each tuple T in R's body.
[23] directT(R, T) ≝ { directL(R, S, A) ∣ A ∈ scalar(T) } ∪ { directN(R, S, A) ∣ A ∈ reference(T) } ∣ S = nodemap(R, pk(T))
directT calculates a common subject S and produces a set of triples from the scalar and reference attributes.
[24] directL(R, S, A) ≝ triple(S, predicatemap(R, A), literalmap(A))
A direct triple for a scalar attribute is the common subject, the predicate map of the relation and attribute, and the literalmap of the attribute.
[25] directN(R, S, A) ≝ triple(S, predicatemap(R, A), nodemap(R, A))
A direct triple for a reference attribute is S, a predicate map, and the node map of the attribute.
5.1 Linked-data-friendly
[26] nodemap(R, A) ≝ hash-nodemap(R, A) | slash-nodemap(R, A)
[27] predicatemap(R, A) ≝ hash-predicatemap(R, A) | slash-predicatemap(R, A)
The definitions of predicatemap and nodemap are consistent with hash or slash flavors of linked data.
[28] hash-nodemap(R, A) ≝ IRI(stem + "/" + R.name "/" A.name + "." + A.value + "#_")
CONCAT(stem, "/" + R.name "/" A.name + "." + A.value + "#_").
[29] hash-predicatemap(R, A) ≝ IRI(stem + "/" + R.name "#" A.name)
etc.
[30] slash-nodemap(R, A) ≝ IRI(stem + "/" + R.name "/" A.name + "." + A.value)
[31] slash-predicatemap(R, A) ≝ IRI(stem + "/" + R.name "/" A.name)
5.2 W3C XML Schema Datatypes
literalmap produces RDF literal with XSD datatypes with this type mapping TM:
[32] literalmap(A) ≝ Literal(A[V], SQL2XSD[A]) ∣ SQL2XSD is the mapping from SQL datatypes to XML datatypes below:
SQL XSD
INT http://www.w3.org/TR/xmlschema-2/#integer
FLOAT http://www.w3.org/TR/xmlschema-2/#float
DATE http://www.w3.org/TR/xmlschema-2/#date
TIME http://www.w3.org/TR/xmlschema-2/#time
TIMESTAMP http://www.w3.org/TR/xmlschema-2/#dateTime
CHAR http://www.w3.org/TR/xmlschema-2/#string
VARCHAR http://www.w3.org/TR/xmlschema-2/#string
STRING http://www.w3.org/TR/xmlschema-2/#string
A literalmap produces an tuple of value and datatype, consistent with RDF.
6 Extending the Direct Mapping
Follow are some recipes to extend the direct mapping, specifically replacing some production numbers in the direct mapping.
6.1 Direct Mapping with Primary Keys (Normative)
[20-pk] scalar(T) ≝ { A in T ∣ A not-Null ∧ A ∉ reference(T) }
The DM-PK graph replaces production 20, removing A ∉ pk(R) from the definition of the scalar function in order to not exclude primary keys.
6.2 Type Annotations (Normative)
[23-type] directT(R, T) ≝ { directL(R, S, A) ∣ A ∈ scalar(T) } ∪ { directN(R, S, A) ∣ A ∈ reference(T) } ∪ { directP(R, S) } ∣ S = nodemap(R, pk(T))
The DM-type graph adds a type arc calculated from the name of R.
[33-type] directP(R, S) ≝ triple(S, rdf:type, typemap(R))
[34-type] typemap(R) ≝ IRI(stem + "#" + R.name)
6.3 Many to Many Mappings (Normative)
[32-m2m] manytomany(R) ≝ { R ∣ R has exactly two attributes (X, Y), being foreign keys to RX.PKX and RY.PKY respectively }
Add a test for manytomany relations test.
[21-m2m] directDB(db) ≝ { directR(r) ∣ r ∉ manytomany(db.R) } ∪ { repeatpropertyR(r) ∣ r ∈ manytomany(db.R) }
Exclude manytomany relations from the calls to directR; instead call repeatproperyR.
[33-m2m] repeatpropertyR(R) ≝ { repeatpropertyT(R, T) ∣ T ∈ R.Body }
For each tuple in an R (with attribute X a foreign key to RX.PKY and attribute Y a foreign key to RY.PKY)
[34-m2m] repeatpropertyT(R, T) ≝ triple(nodemap(RX, PKX), predicatemap(R, Y), nodemap(RY, PKY))
Emit a triple like
triple(IRI(stem + "/" + RX.name "/" PKX.name + "." + PKX.value + "#_"),
IRI(stem + "/" + R.name "/" Y.name + "." + PKX.value + "#_"),
IRI(stem + "/" + RY.name "/" PKY.name + "." + PKY.value + "#_"))
Would non-normative explanations like this be useful in the spec? We
could use a notation to indicate they aren't intended to be precise
and complete, just informative.
> > It would be kind of odd to switch styles of semantics.
> >
> > >
> > >>
> > >> This is one of the requirements of our charter, although of course we
> > >> want mappings to other vocabularies to be possible. Remember, this can
> > >> be
> > >> thought of as a two-step process, where the first step is a default
> > >> mapping, and then later mappigs (via Datalog rules, RIF, SQL or
> > >> whatever)
> > >> could then transform
> > >>
> > >
> > > In this simple approach, the predicates are the only things that are
> > going
> > > to be mapped:
> > >
> > > ex:name ->foaf:name
> > > ....
> > >
> > > So you could have a system that can automatically generate:
> > >
> > > Triple(s, "ex:name", name) <- student(s_id, name), generateURI(s_id, s)
> > >
> > > or the user can write the mapping with the :
> > >
> > > Triple(s, "foaf:name", name) <- student(s_id, name), generateURI(s_id, s)
> > >
> > >
> > >> Could we take the rules given earlier [2] and then use these to produce
> > >> the same effects as Eric's direct mapping proposal? Could someone
> > >> specify
> > >> this in detail?
> > >>
> > >>
> > > The Database-Instance-Only mapping does that.
> > >
> > >
> > >> Then the default mapping could be seen as a certain default application
> > >> of
> > >> rules, an application that *can* be changed.
> > >>
> > >
> > > The rules defines the semantics of what needs to be implemented in an
> > > application
> > >
> > >
> > >>
> > >> cheers,
> > >> harry
> > >>
> > >> [1] http://www.w3.org/2001/sw/rdb2rdf/directGraph/
> > >> [2]http://web.ing.puc.cl/~marenas/W3C/mapping_language.txt
> > >>
> > >>
> > >>
> > >>
> > >
> >
> >
--
-ericP
Received on Sunday, 18 July 2010 21:32:16 UTC