Re: New merged consolidated Direct Mapping version

* Alexandre Bertails <bertails@w3.org> [2010-11-15 18:28-0500]
> Sorry for not answering earlier, RDB2RDF is not my real job at W3C :-)
> 
> On Sat, 2010-11-13 at 15:46 -0600, Juan Sequeda wrote:
> > Alexandre
> > 
> > 
> > you make good points which I need to read thoroughly but I don't want
> > to do over the weekend ;)
> > 
> > 
> > However, quick comments inline
> > 
> > On Sat, Nov 13, 2010 at 3:37 PM, Alexandre Bertails <bertails@w3.org>
> > wrote:
> >         On Sat, 2010-11-13 at 14:47 -0600, Juan Sequeda wrote:
> >         > I'd like to go through this thoroughly but I believe this
> >         looks a lot like:
> >         >
> >         >
> >         http://www.w3.org/2001/sw/rdb2rdf/wiki/Database-Instance-Only_and_Database-Instances-and-Schema_Mapping
> >         >
> >         > This was Marcelo and my proposal a longggg time ago.
> >         
> >         
> >         Yes, Eric made me read it a longggg time ago :-) But this is
> >         not the
> >         same approach (and I prefer the one you took in the merged
> >         document).
> >         
> >         In the merged spec, you say things like [[ Assume that r(a,
> >         b1, ...,
> >         bn) is a table with columns a, b1, ..., bn ... ]]. It's not
> >         clear if
> >         it means "I have a function from a relation r in RDB to a
> >         Datalog
> >         rule", or if you are giving an axiomatic description of the
> >         truth in a
> >         particular case.
> >         
> >         I understood it as an axiomatic description with universal
> >         quantification (the universe of discourse, which is also
> >         missing in
> >         your rules) because as there is no reason to keep two models
> >         of
> >         computation in the same spec, I assumed you were not competing
> >         with
> >         the mapping (which I recall is by definition a function)
> >         itself by
> >         proposing a new one. And if this was actually a function from
> >         RDB to
> >         Datalog, I would have expected to see the formal definition of
> >         a
> >         function with a clear domain and codomain.
> > 
> > 
> > there is no function from RDB to Datalog. 
> > Datalog can be considered syntax for relational algebra. You can say
> > the same thing. IMO, I prefer reading datalog than relational algebra.
> > So r is the name of the table. i.e project attribute name from the
> > table student
> > 
> > 
> > Ans(name) <- Student(_, name, _, _) 
> 
> In Datalog, you cannot reason on the relation r itself. So you need
> something external to go from the relation r to the relation name "r".
> Said differently, as long as you'll put an "r" in a Predicate, this is
> not FOL.
> 
> How do you make the distinction between the relation and its name? Eric
> showed me a scheme but he called it "perverse" :-) And he still needs
> higher-order.

2010-11-15T23:33:47Z <betehess> hey ericP, I've quoted you in http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2010Nov/0085.html
2010-11-15T23:34:17Z <betehess> about your "perverse" encoding of Tables in Datalog :-)
2010-11-15T23:35:32Z <ericP> oh yeah, mid;1289863707.3582.62.camel@simplet
2010-11-15T23:36:04Z <betehess> I thought you may introduce your example
2010-11-15T23:36:09Z <ericP> on it...

heh, here goes:

The current text assumes the trivial mapping from an SQL table to
datalog facts. Eliding column names, we see:

  ┌┤foo├────┬────┐
  │ X  │ Y  │ Z  │
  │ x1 │ y1 │ z1 │ => foo(x1, y1, z1)
  │ x2 │ y2 │ z2 │    foo(x2, y2, z2)
  └────┴────┴────┘

If we use, yes, HOL, to create our datalog facts, we can capture
things like the relation name, attribute (column) names, even the
datatypes and keys. (This effectively recapitulates the catalog.)

  ┌┤foo.schema├─────┬────┬───────┬────┬──────┐
  │ name │ a1 │ t1  │ a2 │ t2    │ a3 │ t3   │
  │  foo │  X │ int │  Y │ float │  Z │ blob │
  └──────┴────┴─────┴────┴───────┴────┴──────┘
  ┌┤foo.data├─┬────┬────┐
  │ name │ a1 │ a2 │ a3 │
  │  foo │ x1 │ y1 │ z1 │
  │  foo │ x2 │ y2 │ z2 │
  └──────┴────┴────┴────┘

>From there, you have all of your column names, relation names, etc. to
use FOL to generate the IRIs and literals for the RDF relation. But,
if you're actually working in a type-safe FOL, you can encode your RDF
in a relation that has types (beyond strings):

  ┌┤foo.graph├┬────────┬─────────┬──────┬────────┬──────────┬──────────────┬──────────┐
  │ sIRI      │ sBNode │ p       │ oIRI │ oBNode │ oLexical │ oDatype      │ oLangTag │
  │ http…X=x1 │ NULL   │ http…#X │ NULL │ NULL   │       x1 │ http…integer │          │
  │ http…X=x1 │ NULL   │ http…#Y │ NULL │ NULL   │       y1 │ http…string  │          │
  │ http…X=x1 │ NULL   │ http…#Z │ NULL │ NULL   │       z1 │ http…float   │          │
  └───────────┴────────┴─────────┴──────┴────────┴──────────┴──────────────┴──────────┘

The mapping from this to an RDF graph is pretty straightforward.


> >         So to be sure I was understanding your rules, I spontaneously
> >         started
> >         to annotate the variables and then, to get rid of the English
> >         (I
> >         always have a problem to consider descriptions in English as
> >         they
> >         escape from the formalism and hide the difficulty), I pushed
> >         the
> >         plain-text constraints into the rules, one after one. I found
> >         very
> >         pleasant to see that you actually use Higher Order Logic (the
> >         [[
> >         Assume that ]] were the clue but I did not get it right away).
> >         By
> >         putting more formalism into the rules, I really understood you
> >         were
> >         giving a nice semantical framework for the Direct Mapping,
> >         more than
> >         giving a way to compute it. The icing on the cake is that you
> >         never
> >         have to say *how* you compute an IRI, for example. You just
> >         have to
> >         say that it exists!
> > 
> > 
> > 
> > 
> > If you are combining the instances of the database AND schema elements
> > (Student is a table, id is a PK of the student table), then it becomes
> > higher order logic. Hence we had a schema+instance mapping. But
> > Marcelo and I came to the conclusion that it was too complicated.
> > Hence we only wanted Instance Mapping.
> 
> Yes I agree it's complicated.
> 
> Mixing schema and data is *a* way to get higher-order. But as long as
> you have the table name outside of a predicate position, this is gonna
> be higher-order.
> 
> 
> >         
> >         The algebra tells you the "what" (the Abstract Models) and the
> >         "how"
> >         (the mapping functions), whereas your Axiomatic Semantics
> >         tells you
> >         the truth in the model.
> >         
> >         May I suggest the editors (Eric, that includes you) to make
> >         clear the
> >         relation between the Direct Mapping (the algebra) and its
> >         Axiomatic
> >         Semantics?
> > 
> > 
> > Yes we need to do that.
> 
> Editorial proposal to put somewhere in the introduction:
> [[
> The Direct Mapping is an algebra defining the mapping from RDB to RDF,
> expressed in Type Theory. The Axiomatic Semantics defines the set of
> laws which the Direct Mapping must respect.
> ]]
> 
> By the way, by trying to define the laws binding Foreign Keys to
> Candidate Keys, it seems I need to define Predicate on top of multisets
> instead of sets. Do you have a plan for this?
> 
> Alexandre.
> 
> >  
> >         
> >         Alexandre.
> >         
> >         
> >         
> >         >
> >         > Juan Sequeda
> >         > www.juansequeda.com
> >         >
> >         > On Nov 13, 2010, at 2:34 PM, Alexandre Bertails
> >         <bertails@w3.org> wrote:
> >         >
> >         > > On Fri, 2010-11-12 at 09:17 -0600, Juan Sequeda wrote:
> >         > >> Hi Everybody
> >         > >>
> >         > >>
> >         > >> Just to remind everybody that the new merged consolidated
> >         document can
> >         > >> be found here:
> >         > >>
> >         > >>
> >         > >> http://www.w3.org/2001/sw/rdb2rdf/directMapping/
> >         > >
> >         > > Looking at the roles of section section 6 Direct Mapping
> >         as Rules
> >         > > and 5 Direct Mapping Definition, I see an easy division
> >         between an
> >         > > axiomatic semantics and an algebra which
> >         implements/conforms to that
> >         > > semantics. As an example, section 6's generateColumnIRI
> >         declares a
> >         > > binding between a lists of column names and the
> >         corresponding RDF
> >         > > predicate IRI. You can view generateColumnIRI without
> >         explicit
> >         > > quantification (quoted from section 6):
> >         > >
> >         > >  generateColumnIRI(x, y, z): Given a table name x and a
> >         non-empty list of columns y, it generates the Column IRI z
> >         > >
> >         > > or with quantification:
> >         > >
> >         > >  ∀ r ∈ Table, ∀ columns ∈ [ Column ], ∀ iri ∈ IRI,
> >         generateColumnIRI(r, columns, iri) ← nonempty(columns)
> >         > >
> >         > > The generateColumnIRI rule is *realized* in Section 5's
> >         propertyIRI
> >         > > mapping from a list of columns to an IRI:
> >         > >
> >         > >  [32] propertyIRI(R, As) ≝ IRI(base + "/" + (join(',',
> >         UE(A.name)) ∣ A ∈ As ) "#" As.name)
> >         > >
> >         > > More formally, given an axiomatic semantics
> >         > > [[
> >         > >  ∀ r ∈ Table, ∀ iri ∈ IRI, generateTableIRI(r, iri)
> >         > >  ∀ r ∈ Table, ∀ columns ∈ [ Column ], ∀ iri ∈ IRI,
> >         generateColumnIRI(r, columns, iri) ← nonempty(columns)
> >         > >  ∀ r ∈ Table, ∀ columns ∈ [ Column ], ∀ values ∈
> >         [ value ], ∀ iri ∈ IRI,
> >         > >    generateRowIRI(r, columns, values, iri) ←
> >         nonempty(columns), nonempty(values)
> >         > >  ∀ r ∈ Table, ∀ values ∈ [ value ], ∀ bn ∈ BlankNode,
> >         generateRowBlankNode(r, values, bn) ← hasNoPrimaryKey(r)
> >         > >  ∀ r ∈ Table, ∀ column ∈ Column, ∀ value ∈ value,
> >         getValue(r, column, value)
> >         > >  ∀ r ∈ Table, ∀ c1 ∈ Column, ..., ∀ cn ∈ Column, ∀ x1 ∈
> >         value, ..., ∀ xn ∈ value,
> >         > >    getListValue(r, [c1, ..., cn], [x1, ..., xn]) ←
> >         getValue(r, c1, x1), ..., getValue(r, cn, xn)
> >         > >  (6.1.2 subsumes 6.1.1)
> >         > >  ∀ s ∈ Subject, ∀ o ∈ Object, ∀ r ∈ Table, ∀ c1 ∈
> >         Column, ..., ∀ cm ∈ Column, ∀ pk ∈ [ Column ], ∀ |pk| ∈
> >         [ value ],
> >         > >    Triple(s, IRI("rdf:type"), o) ← r(c1, ..., cm),
> >         > >                                    isPrimaryKey(r, pk),
> >         > >                                    getListValue(r, pk, |
> >         pk|)
> >         > >                                    generateRowIRI(r, pk, |
> >         pk|, s),
> >         > >                                    generateTableIRI(r, o)
> >         > >  (6.1.3)
> >         > >  ∀ s ∈ Subject, ∀ o ∈ Object, ∀ r ∈ Table, ∀ c1 ∈
> >         Column, ..., ∀ cn ∈ Column,
> >         > >    Triple(s, IRI("rdf:type"), o) ← r(c1, ..., cn),
> >         > >                                    hasNoPrimaryKey(r),
> >         > >                                    generateRowBlankNode(r,
> >         [c1, ..., cn], s),
> >         > >                                    generateTableIRI(r, o)
> >         > >  (6.2.2 subsumes 6.2.1)
> >         > >    the 2 rules can be factorized as there is no reason to
> >         distinguish aj and bj (the conditions are the same)
> >         > >    the "or" implies a split of the rule
> >         > >  ∀ s ∈ Subject, ∀ p ∈ Predicate, ∀ xj ∈ value, ∀ r ∈
> >         Table, ∀ c1 ∈ Column, ..., ∀ cm ∈ Column,
> >         > >
> >         ∀ c ∈ Column, ∀ x ∈ value,
> >         > >
> >         ∀ pk ∈ [ Column ], ∀ |pk| ∈ [ value ],
> >         > >    Triple(s, p, x) ← r(c1, ..., cm),
> >         > >                      isPrimaryKey(r, pk),            // pk
> >         is the PK of r
> >         > >                      in(c, pk),                      // c
> >         is a Column in pk
> >         > >                      isNotForeignKey(r, [ c ]),      // c
> >         is not the only constituent of a foreign key of r
> >         > >                      getListValue(r, pk, |pk|)
> >         > >                      generateRowIRI(r, pk, |pk|, s),
> >         > >                      generateColumnIRI(r, [ c ], p),
> >         > >                      getValue(r, c, x)
> >         > >    Triple(s, p, x) ← r(c1, ..., cm),
> >         > >                      isPrimaryKey(r, pk),             //
> >         pk is the PK of r
> >         > >                      in(c, pk),                       // c
> >         is a Column in pk
> >         > >                      isForeignKey(r, [ c ]),          // c
> >         is the only constituent of a foreign key of r
> >         > >                      references(r, [ c ], r', ck),    // c
> >         references a candidate key ck in another table
> >         > >                      isPrimaryKey(r', ck),            //
> >         ck is the PK of this other table
> >         > >                      getListValue(r, pk, |pk|),
> >         > >                      generateRowIRI(r, pk, |pk|, s),
> >         > >                      generateColumnIRI(r, [ c ], p),
> >         > >                      getValue(r, c, x)
> >         > >  (6.2.3)
> >         > >  ∀ s ∈ Subject, ∀ p ∈ Predicate, ∀ r ∈ Table, ∀ c1 ∈
> >         Column, ..., ∀ cm ∈ Column, ∀ c ∈ Column, ∀ x ∈ value,
> >         > >    Triple(s, p, x) ← r(c1, ..., cn),
> >         > >                      hasNoPrimaryKey(r),
> >         > >                      generateRowBlankNode(r, [c1, ...,
> >         cn], s),
> >         > >                      in(c, [c1, ..., cn]),
> >         > >                      generateColumnIRI(r, [ c ], p),
> >         > >                      getValue(r, c, x)
> >         > > ]]
> >         > >
> >         > > and an algebra:
> >         > >
> >         > > [[
> >         > > [1]    Database        ≝  { TableName → Table }
> >         > > [2]    Table           ≝  ( Header, [CandidateKey],
> >         CandidateKey?, ForeignKeys, Body )
> >         > > [3]    Header          ≝  { ColumnName → SQLDatatype }
> >         > > [4]    CandidateKey    ≝  [ ColumnName ]
> >         > > [5]    ForeignKeys     ≝  { [ColumnName] → ( Table,
> >         [ColumnName] ) }
> >         > > [6]    SQLDatatype     ≝  { INT | FLOAT | DATE | TIME |
> >         TIMESTAMP | CHAR | VARCHAR | STRING }
> >         > > [7]    Body            ≝  [ Tuple ]
> >         > > [8]    Tuple           ≝  { ColumnName → CellValue }
> >         > > [9]    CellValue       ≝  value | Null
> >         > >
> >         > > [10]   Graph           ≝  { Triple }
> >         > > [11]   Triple          ≝  ( Subject, Predicate, Object )
> >         > > [12]   Subject         ≝  IRI | BlankNode
> >         > > [13]   Predicate       ≝  IRI
> >         > > [14]   Object          ≝  IRI | BlankNode | Literal
> >         > > [15]   IRI             ≝  RDF URI-reference as
> >         subsequently restricted by SPARQL
> >         > > [16]   BlankNode       ≝  RDF blank node
> >         > > [17]   Literal         ≝  PlainLiteral | TypedLiteral
> >         > > [18]   PlainLiteral    ≝  (lexicalForm) | (lexicalForm,
> >         langageTag)
> >         > > [19]   TypedLiteral    ≝  (lexicalForm, IRI)
> >         > > ]]
> >         > >
> >         > > , one could show that the algebra fits the axiomatic
> >         semantics. In
> >         > > "Data Exchange: Semantics and Query Answering", Fagin et
> >         al. focused
> >         > > on separating the axiomatic semantics (which they call the
> >         "universal
> >         > > solution") from their data exchange algorithms.
> >         > >
> >         > > Alexandre.
> >         > >
> >         > >
> >         > >>
> >         > >>
> >         > >> Old versions of the document are:
> >         > >>
> >         > >>
> >         > >> http://www.w3.org/2001/sw/rdb2rdf/directGraph/
> >         > >> http://www.w3.org/2001/sw/rdb2rdf/directGraph/alt
> >         > >>
> >         > >>
> >         > >>
> >         > >>
> >         > >> Looking forward to your comments
> >         > >>
> >         > >>
> >         > >> Juan Sequeda
> >         > >> +1-575-SEQ-UEDA
> >         > >> www.juansequeda.com
> >         > >>
> >         > >
> >         > >
> >         > >
> >         > >
> >         > >
> >         > >
> >         > >
> >         >
> >         
> >         
> >         
> >         
> >         
> > 
> > 
> 
> 
> 

-- 
-ericP

Received on Tuesday, 16 November 2010 00:06:16 UTC