W3C home > Mailing lists > Public > public-rdb2rdf-wg@w3.org > July 2010

Re: Relationship between EricP's default mapping and Datalog rules approach?

From: Eric Prud'hommeaux <eric@w3.org>
Date: Sun, 18 Jul 2010 17:31:38 -0400
To: Juan Sequeda <juanfederico@gmail.com>
Cc: Harry Halpin <hhalpin@w3.org>, public-rdb2rdf-wg@w3.org
Message-ID: <20100718213135.GC18491@w3.org>
* Juan Sequeda <juanfederico@gmail.com> [2010-07-18 12:38-0500]
> On Sun, Jul 18, 2010 at 12:23 PM, Harry Halpin <hhalpin@w3.org> wrote:
> 
> > > Harry,
> > >
> > > On Sun, Jul 18, 2010 at 8:26 AM, Harry Halpin <hhalpin@w3.org> wrote:
> > >
> > >> While I enjoyed the talk last week, I was wondering about the
> > >> relationship
> > >> between Eric's proposed direct mapping [1] and the rules put forward
> > >> last
> > >> week by Marcelo [2]. This question goes to both, and the entire working
> > >> group.
> > >>
> > >> One of the advantages of Eric's default mapping mechanism [1] is that it
> > >> allows relational data to be expressed in RDF without the author of the
> > >> mapping knowing *any* rules or having any ontology that he or she wants
> > >> to
> > >> map their relational data to.
> > >>
> > >
> > > This is exactly the same as the Database-Instance-Only mapping.
> >
> > Are we sure? Eric - thoughts?

The main goal of http://www.w3.org/2001/sw/rdb2rdf/directGraph/ was to
precisely define the default graph. If the document is successful, an
implementor should be able to take a relational database and a stem
URI and create a (virtual) direct graph.

> > There's at least two differences I see. Syntactically, ericP is not
> > generating any new predicate URIs (foaf:name), thus his insistence on
> > creating a "stem graph" with default URIs. I imagine this will just be a
> > simple option, with the generateURIs being created by a call to some
> > standardized interface to the Linked Data Web via a search engine like
> > Sindice, a vocabulary management service, or something like OKKAM.
> >
> >
> I think this is an issue of the syntax. A predicate needs to be created.
> This is the semantics. How it's going to be done is another issue.
> 
> The second difference is how Eric decided to express his semantics, i.e.
> > using sets rather than Datalog-ish rules that resemble FOL. I went over
> > Eric's work only once, but I believe we need to make a decision as a
> > Working Group to pick one style of doing semantics and stick with it in
> > the spec, even though they are technically equivalent, i.e. we should
> > choose between set-theoretic model theory or just a mapping to
> > FOL/Datalog/RIF semantics with a standard interpretation.
> >
> 
> Honestly, I have trouble understanding the semantics that Eric has written.
> 
> I would recommend using Datalog because
> 
> 1) it has well defined semantics
> 2) it can be translated to RIF
> 3) it can be translated to SQL

I eventually picked set semantics because of the success of "Semantics
and Complexity of SPARQL" Pérez, Arenas, and Gutierrez
  http://arxiv.org/pdf/cs.DB/0605124 

This is a good opportunity for me to proof-read and provide an English
reading, using the definitions in the Notation section:

[1]   	Database	   ≝   	{ RelName → Relation }
    Database is a mapping from relation name to relation.
[2]   	Relation	   ≝   	( Header, PrimaryKey, ForeignKeys, Body )
    Relation is a tuple of a header, primary key, foreign keys and body.
[3]   	Header	   ≝   	{ AttrName → SQLDatatype }
    Header is a mapping from attribute name to SQL datatype.
[4]   	PrimaryKey	   ≝   	[ AttrName ]
    PrimaryKey is a list of attribute names.
[5]   	ForeignKeys	   ≝   	{ AttrName → ( Relation, AttrName ) }
    ForeignKeys is a mapping from attribute name to tuples of relation and attribute name.
[6]   	SQLDatatype	   ≝   	{ INT | FLOAT | CHARn }
    SQLDatatype is, for now, an INT, FLOAT or CHARn (e.g. CHAR(40)).
[7]   	Body	   ≝   	[ Tuple ]
    Body is a list of tuples (note list, SQL semantics, not set, relational).
[8]   	Tuple	   ≝   	{ AttrName → CellValue }
    Tuple is a mapping from attribute name to cell value
[9]   	CellValue	   ≝   	value | Null
    CellValue is a some value or Null (à la SQL).

4.2 RDF Model Definition (Normative)


[10]   	Graph	   ≝   	{ Triple }
     An RDF graph is a set of triples.
[11]   	Triple	   ≝   	( Subject, Predicate, Object )
     A triple is a tuple of subject, predicate, object.
[12]   	Subject	   ≝   	IRI ⊔ BlankNode
     A subject is a IRI (disjoint) or a blank node.
[13]   	Predicate	   ≝   	IRI
     A predicate is an IRI.
[14]   	Object	   ≝   	IRI ⊔ BlankNode ⊔ Literal
     An object is a IRI or a blank node or a literal.
[15]   	IRI	   ≝   	RDF URI-reference as subsequently restricted by SPARQL.
     An IRI is defined by RDF and restricted (to exclude spaces) by SPARQL.
[16]   	BlankNode	   ≝   	RDF blank node.
     A blank node is defined by RDF.
[17]   	Literal	   ≝   	(lexicalValue, IRI) per RDF literal.
     A literal is a tuple of a lexical value and an IRI, per RDF.

5 Direct Mapping Definition (Normative)
      Now the definitions for the Direct Mapping (how you produce a
      direct graph from any database and stem IRI), which is defined
      for relations with a single primary key.

[18]   	pk(R)	   ≝   	A ∣ first A ∈ R.PrimaryKey
     The primary key of R is an attribute such that it is the first (sole) attribute element of the primary key.
[19]   	reference(T)	   ≝   	{ A in T ∣ A ∈ R.ForeignKeys }
     A tuple's reference attributes are the set of attributes A such that A is an element of the relations's foreign keys.
[20]   	scalar(T)	   ≝   	{ A in T ∣ A not-Null ∧ A ∉ pk(R) ∧ A ∉ reference(T) }
     A tuple's scalar attributes are the attributes in T which are not null, not in the pk and not reference attributes.

	The direct* functions make tuples T in a relation R in a db to
	an RDF graph.

[21]   	directDB(db)	   ≝   	{ directR(r) ∣ r ∈ db }
     directDB of a DB is the set of directR for each r in the db.
[22]   	directR(R)	   ≝   	{ directT(R, T) ∣ T ∈ R.Body }
     directR is the set of directT for each tuple T in R's body.
[23]   	directT(R, T)	   ≝   	{ directL(R, S, A) ∣ A ∈ scalar(T) } ∪ { directN(R, S, A) ∣ A ∈ reference(T) } ∣ S = nodemap(R, pk(T))
     directT calculates a common subject S and produces a set of triples from the scalar and reference attributes.
[24]   	directL(R, S, A)	   ≝   	triple(S, predicatemap(R, A), literalmap(A))
     A direct triple for a scalar attribute is the common subject, the predicate map of the relation and attribute, and the literalmap of the attribute.
[25]   	directN(R, S, A)	   ≝   	triple(S, predicatemap(R, A), nodemap(R, A))
     A direct triple for a reference attribute is S, a predicate map, and the node map of the attribute.

5.1 Linked-data-friendly
	
[26]   	nodemap(R, A)	   ≝   	hash-nodemap(R, A) | slash-nodemap(R, A)
[27]   	predicatemap(R, A)	   ≝   	hash-predicatemap(R, A) | slash-predicatemap(R, A)
     The definitions of predicatemap and nodemap are consistent with hash or slash flavors of linked data.
[28]   	hash-nodemap(R, A)	   ≝   	IRI(stem + "/" + R.name "/" A.name + "." + A.value + "#_")
     CONCAT(stem, "/" + R.name "/" A.name + "." + A.value + "#_").
[29]   	hash-predicatemap(R, A)	   ≝   	IRI(stem + "/" + R.name "#" A.name)
     etc.
[30]   	slash-nodemap(R, A)	   ≝   	IRI(stem + "/" + R.name "/" A.name + "." + A.value)
[31]   	slash-predicatemap(R, A)	   ≝   	IRI(stem + "/" + R.name "/" A.name)

5.2 W3C XML Schema Datatypes


literalmap produces RDF literal with XSD datatypes with this type mapping TM:
	
[32]   	literalmap(A)	   ≝   	Literal(A[V], SQL2XSD[A]) ∣ SQL2XSD is the mapping from SQL datatypes to XML datatypes below:
SQL			XSD                                       
INT			http://www.w3.org/TR/xmlschema-2/#integer 
FLOAT			http://www.w3.org/TR/xmlschema-2/#float   
DATE			http://www.w3.org/TR/xmlschema-2/#date	   
TIME			http://www.w3.org/TR/xmlschema-2/#time	   
TIMESTAMP		http://www.w3.org/TR/xmlschema-2/#dateTime
CHAR			http://www.w3.org/TR/xmlschema-2/#string  
VARCHAR		http://www.w3.org/TR/xmlschema-2/#string  
STRING		http://www.w3.org/TR/xmlschema-2/#string  
     A literalmap produces an tuple of value and datatype, consistent with RDF.

6 Extending the Direct Mapping

     Follow are some recipes to extend the direct mapping, specifically replacing some production numbers in the direct mapping.

6.1 Direct Mapping with Primary Keys (Normative)
	
[20-pk]   	scalar(T)	   ≝   	{ A in T ∣ A not-Null ∧ A ∉ reference(T) }
     The DM-PK graph replaces production 20, removing A ∉ pk(R) from the definition of the scalar function in order to not exclude primary keys.

6.2 Type Annotations (Normative)

[23-type]   	directT(R, T)	   ≝   	{ directL(R, S, A) ∣ A ∈ scalar(T) } ∪ { directN(R, S, A) ∣ A ∈ reference(T) }  ∪ { directP(R, S) } ∣ S = nodemap(R, pk(T))
     The DM-type graph adds a type arc calculated from the name of R.
[33-type]       directP(R, S)	   ≝   	triple(S, rdf:type, typemap(R))
[34-type]       typemap(R)	   ≝   	IRI(stem + "#" + R.name)

6.3 Many to Many Mappings (Normative)
	
[32-m2m]   	manytomany(R)	   ≝   	{ R ∣ R has exactly two attributes (X, Y), being foreign keys to RX.PKX and RY.PKY respectively }
      Add a test for manytomany relations test.
[21-m2m]   	directDB(db)	   ≝   	{ directR(r) ∣ r ∉ manytomany(db.R) } ∪ { repeatpropertyR(r) ∣ r ∈ manytomany(db.R) }
      Exclude manytomany relations from the calls to directR; instead call repeatproperyR.
[33-m2m]   	repeatpropertyR(R)	   ≝   	{ repeatpropertyT(R, T) ∣ T ∈ R.Body }
      For each tuple in an R (with attribute X a foreign key to RX.PKY and attribute Y a foreign key to RY.PKY)
[34-m2m]   	repeatpropertyT(R, T)	   ≝   	triple(nodemap(RX, PKX), predicatemap(R, Y), nodemap(RY, PKY))
      Emit a triple like
        triple(IRI(stem + "/" + RX.name "/" PKX.name + "." + PKX.value + "#_"),
               IRI(stem + "/" + R.name "/" Y.name + "." + PKX.value + "#_"),
               IRI(stem + "/" + RY.name "/" PKY.name + "." + PKY.value + "#_"))


Would non-normative explanations like this be useful in the spec? We
could use a notation to indicate they aren't intended to be precise
and complete, just informative.

> > It would be kind of odd to switch styles of semantics.
> >
> > >
> > >>
> > >>  This is one of the requirements of our charter, although of course we
> > >> want mappings to other vocabularies to be possible. Remember, this can
> > >> be
> > >> thought of as a two-step process, where the first step is a default
> > >> mapping, and then later mappigs (via Datalog rules, RIF, SQL or
> > >> whatever)
> > >> could then transform
> > >>
> > >
> > > In this simple approach, the predicates are the only things that are
> > going
> > > to be mapped:
> > >
> > > ex:name ->foaf:name
> > > ....
> > >
> > > So you could have a system that can automatically generate:
> > >
> > > Triple(s, "ex:name", name) <- student(s_id, name), generateURI(s_id, s)
> > >
> > > or the user can write the mapping with the :
> > >
> > > Triple(s, "foaf:name", name) <- student(s_id, name), generateURI(s_id, s)
> > >
> > >
> > >> Could we take the rules given earlier [2] and then use these to produce
> > >> the same effects as Eric's direct mapping proposal? Could someone
> > >> specify
> > >> this in detail?
> > >>
> > >>
> > > The Database-Instance-Only mapping does that.
> > >
> > >
> > >> Then the default mapping could be seen as a certain default application
> > >> of
> > >> rules, an application that *can* be changed.
> > >>
> > >
> > > The rules defines the semantics of what needs to be implemented in an
> > > application
> > >
> > >
> > >>
> > >>            cheers,
> > >>                 harry
> > >>
> > >> [1] http://www.w3.org/2001/sw/rdb2rdf/directGraph/
> > >> [2]http://web.ing.puc.cl/~marenas/W3C/mapping_language.txt
> > >>
> > >>
> > >>
> > >>
> > >
> >
> >

-- 
-ericP
Received on Sunday, 18 July 2010 21:32:16 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:00:21 UTC