Re: Relationship between EricP's default mapping and Datalog rules approach?

* Juan Sequeda <juanfederico@gmail.com> [2010-07-18 12:38-0500]
> On Sun, Jul 18, 2010 at 12:23 PM, Harry Halpin <hhalpin@w3.org> wrote:
> 
> > > Harry,
> > >
> > > On Sun, Jul 18, 2010 at 8:26 AM, Harry Halpin <hhalpin@w3.org> wrote:
> > >
> > >> While I enjoyed the talk last week, I was wondering about the
> > >> relationship
> > >> between Eric's proposed direct mapping [1] and the rules put forward
> > >> last
> > >> week by Marcelo [2]. This question goes to both, and the entire working
> > >> group.
> > >>
> > >> One of the advantages of Eric's default mapping mechanism [1] is that it
> > >> allows relational data to be expressed in RDF without the author of the
> > >> mapping knowing *any* rules or having any ontology that he or she wants
> > >> to
> > >> map their relational data to.
> > >>
> > >
> > > This is exactly the same as the Database-Instance-Only mapping.
> >
> > Are we sure? Eric - thoughts?

The main goal of http://www.w3.org/2001/sw/rdb2rdf/directGraph/ was to
precisely define the default graph. If the document is successful, an
implementor should be able to take a relational database and a stem
URI and create a (virtual) direct graph.

> > There's at least two differences I see. Syntactically, ericP is not
> > generating any new predicate URIs (foaf:name), thus his insistence on
> > creating a "stem graph" with default URIs. I imagine this will just be a
> > simple option, with the generateURIs being created by a call to some
> > standardized interface to the Linked Data Web via a search engine like
> > Sindice, a vocabulary management service, or something like OKKAM.
> >
> >
> I think this is an issue of the syntax. A predicate needs to be created.
> This is the semantics. How it's going to be done is another issue.
> 
> The second difference is how Eric decided to express his semantics, i.e.
> > using sets rather than Datalog-ish rules that resemble FOL. I went over
> > Eric's work only once, but I believe we need to make a decision as a
> > Working Group to pick one style of doing semantics and stick with it in
> > the spec, even though they are technically equivalent, i.e. we should
> > choose between set-theoretic model theory or just a mapping to
> > FOL/Datalog/RIF semantics with a standard interpretation.
> >
> 
> Honestly, I have trouble understanding the semantics that Eric has written.
> 
> I would recommend using Datalog because
> 
> 1) it has well defined semantics
> 2) it can be translated to RIF
> 3) it can be translated to SQL

I eventually picked set semantics because of the success of "Semantics
and Complexity of SPARQL" Pérez, Arenas, and Gutierrez
  http://arxiv.org/pdf/cs.DB/0605124 

This is a good opportunity for me to proof-read and provide an English
reading, using the definitions in the Notation section:

[1]    Database    ≝    { RelName → Relation }
    Database is a mapping from relation name to relation.
[2]    Relation    ≝    ( Header, PrimaryKey, ForeignKeys, Body )
    Relation is a tuple of a header, primary key, foreign keys and body.
[3]    Header    ≝    { AttrName → SQLDatatype }
    Header is a mapping from attribute name to SQL datatype.
[4]    PrimaryKey    ≝    [ AttrName ]
    PrimaryKey is a list of attribute names.
[5]    ForeignKeys    ≝    { AttrName → ( Relation, AttrName ) }
    ForeignKeys is a mapping from attribute name to tuples of relation and attribute name.
[6]    SQLDatatype    ≝    { INT | FLOAT | CHARn }
    SQLDatatype is, for now, an INT, FLOAT or CHARn (e.g. CHAR(40)).
[7]    Body    ≝    [ Tuple ]
    Body is a list of tuples (note list, SQL semantics, not set, relational).
[8]    Tuple    ≝    { AttrName → CellValue }
    Tuple is a mapping from attribute name to cell value
[9]    CellValue    ≝    value | Null
    CellValue is a some value or Null (à la SQL).

4.2 RDF Model Definition (Normative)


[10]    Graph    ≝    { Triple }
     An RDF graph is a set of triples.
[11]    Triple    ≝    ( Subject, Predicate, Object )
     A triple is a tuple of subject, predicate, object.
[12]    Subject    ≝    IRI ⊔ BlankNode
     A subject is a IRI (disjoint) or a blank node.
[13]    Predicate    ≝    IRI
     A predicate is an IRI.
[14]    Object    ≝    IRI ⊔ BlankNode ⊔ Literal
     An object is a IRI or a blank node or a literal.
[15]    IRI    ≝    RDF URI-reference as subsequently restricted by SPARQL.
     An IRI is defined by RDF and restricted (to exclude spaces) by SPARQL.
[16]    BlankNode    ≝    RDF blank node.
     A blank node is defined by RDF.
[17]    Literal    ≝    (lexicalValue, IRI) per RDF literal.
     A literal is a tuple of a lexical value and an IRI, per RDF.

5 Direct Mapping Definition (Normative)
      Now the definitions for the Direct Mapping (how you produce a
      direct graph from any database and stem IRI), which is defined
      for relations with a single primary key.

[18]    pk(R)    ≝    A ∣ first A ∈ R.PrimaryKey
     The primary key of R is an attribute such that it is the first (sole) attribute element of the primary key.
[19]    reference(T)    ≝    { A in T ∣ A ∈ R.ForeignKeys }
     A tuple's reference attributes are the set of attributes A such that A is an element of the relations's foreign keys.
[20]    scalar(T)    ≝    { A in T ∣ A not-Null ∧ A ∉ pk(R) ∧ A ∉ reference(T) }
     A tuple's scalar attributes are the attributes in T which are not null, not in the pk and not reference attributes.

 The direct* functions make tuples T in a relation R in a db to
 an RDF graph.

[21]    directDB(db)    ≝    { directR(r) ∣ r ∈ db }
     directDB of a DB is the set of directR for each r in the db.
[22]    directR(R)    ≝    { directT(R, T) ∣ T ∈ R.Body }
     directR is the set of directT for each tuple T in R's body.
[23]    directT(R, T)    ≝    { directL(R, S, A) ∣ A ∈ scalar(T) } ∪ { directN(R, S, A) ∣ A ∈ reference(T) } ∣ S = nodemap(R, pk(T))
     directT calculates a common subject S and produces a set of triples from the scalar and reference attributes.
[24]    directL(R, S, A)    ≝    triple(S, predicatemap(R, A), literalmap(A))
     A direct triple for a scalar attribute is the common subject, the predicate map of the relation and attribute, and the literalmap of the attribute.
[25]    directN(R, S, A)    ≝    triple(S, predicatemap(R, A), nodemap(R, A))
     A direct triple for a reference attribute is S, a predicate map, and the node map of the attribute.

5.1 Linked-data-friendly
 
[26]    nodemap(R, A)    ≝    hash-nodemap(R, A) | slash-nodemap(R, A)
[27]    predicatemap(R, A)    ≝    hash-predicatemap(R, A) | slash-predicatemap(R, A)
     The definitions of predicatemap and nodemap are consistent with hash or slash flavors of linked data.
[28]    hash-nodemap(R, A)    ≝    IRI(stem + "/" + R.name "/" A.name + "." + A.value + "#_")
     CONCAT(stem, "/" + R.name "/" A.name + "." + A.value + "#_").
[29]    hash-predicatemap(R, A)    ≝    IRI(stem + "/" + R.name "#" A.name)
     etc.
[30]    slash-nodemap(R, A)    ≝    IRI(stem + "/" + R.name "/" A.name + "." + A.value)
[31]    slash-predicatemap(R, A)    ≝    IRI(stem + "/" + R.name "/" A.name)

5.2 W3C XML Schema Datatypes


literalmap produces RDF literal with XSD datatypes with this type mapping TM:
 
[32]    literalmap(A)    ≝    Literal(A[V], SQL2XSD[A]) ∣ SQL2XSD is the mapping from SQL datatypes to XML datatypes below:
SQL   XSD                                       
INT   http://www.w3.org/TR/xmlschema-2/#integer 
FLOAT   http://www.w3.org/TR/xmlschema-2/#float   
DATE   http://www.w3.org/TR/xmlschema-2/#date    
TIME   http://www.w3.org/TR/xmlschema-2/#time    
TIMESTAMP  http://www.w3.org/TR/xmlschema-2/#dateTime
CHAR   http://www.w3.org/TR/xmlschema-2/#string  
VARCHAR  http://www.w3.org/TR/xmlschema-2/#string  
STRING  http://www.w3.org/TR/xmlschema-2/#string  
     A literalmap produces an tuple of value and datatype, consistent with RDF.

6 Extending the Direct Mapping

     Follow are some recipes to extend the direct mapping, specifically replacing some production numbers in the direct mapping.

6.1 Direct Mapping with Primary Keys (Normative)
 
[20-pk]    scalar(T)    ≝    { A in T ∣ A not-Null ∧ A ∉ reference(T) }
     The DM-PK graph replaces production 20, removing A ∉ pk(R) from the definition of the scalar function in order to not exclude primary keys.

6.2 Type Annotations (Normative)

[23-type]    directT(R, T)    ≝    { directL(R, S, A) ∣ A ∈ scalar(T) } ∪ { directN(R, S, A) ∣ A ∈ reference(T) }  ∪ { directP(R, S) } ∣ S = nodemap(R, pk(T))
     The DM-type graph adds a type arc calculated from the name of R.
[33-type]       directP(R, S)    ≝    triple(S, rdf:type, typemap(R))
[34-type]       typemap(R)    ≝    IRI(stem + "#" + R.name)

6.3 Many to Many Mappings (Normative)
 
[32-m2m]    manytomany(R)    ≝    { R ∣ R has exactly two attributes (X, Y), being foreign keys to RX.PKX and RY.PKY respectively }
      Add a test for manytomany relations test.
[21-m2m]    directDB(db)    ≝    { directR(r) ∣ r ∉ manytomany(db.R) } ∪ { repeatpropertyR(r) ∣ r ∈ manytomany(db.R) }
      Exclude manytomany relations from the calls to directR; instead call repeatproperyR.
[33-m2m]    repeatpropertyR(R)    ≝    { repeatpropertyT(R, T) ∣ T ∈ R.Body }
      For each tuple in an R (with attribute X a foreign key to RX.PKY and attribute Y a foreign key to RY.PKY)
[34-m2m]    repeatpropertyT(R, T)    ≝    triple(nodemap(RX, PKX), predicatemap(R, Y), nodemap(RY, PKY))
      Emit a triple like
        triple(IRI(stem + "/" + RX.name "/" PKX.name + "." + PKX.value + "#_"),
               IRI(stem + "/" + R.name "/" Y.name + "." + PKX.value + "#_"),
               IRI(stem + "/" + RY.name "/" PKY.name + "." + PKY.value + "#_"))


Would non-normative explanations like this be useful in the spec? We
could use a notation to indicate they aren't intended to be precise
and complete, just informative.

> > It would be kind of odd to switch styles of semantics.
> >
> > >
> > >>
> > >>  This is one of the requirements of our charter, although of course we
> > >> want mappings to other vocabularies to be possible. Remember, this can
> > >> be
> > >> thought of as a two-step process, where the first step is a default
> > >> mapping, and then later mappigs (via Datalog rules, RIF, SQL or
> > >> whatever)
> > >> could then transform
> > >>
> > >
> > > In this simple approach, the predicates are the only things that are
> > going
> > > to be mapped:
> > >
> > > ex:name ->foaf:name
> > > ....
> > >
> > > So you could have a system that can automatically generate:
> > >
> > > Triple(s, "ex:name", name) <- student(s_id, name), generateURI(s_id, s)
> > >
> > > or the user can write the mapping with the :
> > >
> > > Triple(s, "foaf:name", name) <- student(s_id, name), generateURI(s_id, s)
> > >
> > >
> > >> Could we take the rules given earlier [2] and then use these to produce
> > >> the same effects as Eric's direct mapping proposal? Could someone
> > >> specify
> > >> this in detail?
> > >>
> > >>
> > > The Database-Instance-Only mapping does that.
> > >
> > >
> > >> Then the default mapping could be seen as a certain default application
> > >> of
> > >> rules, an application that *can* be changed.
> > >>
> > >
> > > The rules defines the semantics of what needs to be implemented in an
> > > application
> > >
> > >
> > >>
> > >>            cheers,
> > >>                 harry
> > >>
> > >> [1] http://www.w3.org/2001/sw/rdb2rdf/directGraph/
> > >> [2]http://web.ing.puc.cl/~marenas/W3C/mapping_language.txt
> > >>
> > >>
> > >>
> > >>
> > >
> >
> >

-- 
-ericP

Received on Sunday, 18 July 2010 21:32:16 UTC