Re: New merged consolidated Direct Mapping version from Juan Sequeda on 2010-11-13 (public-rdb2rdf-wg@w3.org from November 2010)

From: Juan Sequeda <juanfederico@gmail.com>
Date: Sat, 13 Nov 2010 15:46:42 -0600
To: Alexandre Bertails <bertails@w3.org>
Cc: "public-rdb2rdf-wg@w3.org" <public-rdb2rdf-wg@w3.org>
Message-ID: <AANLkTi=3dU+q=hS5kmTZOtk_K4r3=tTikomRRRsXUmU_@mail.gmail.com>
Alexandre

you make good points which I need to read thoroughly but I don't want to do
over the weekend ;)

However, quick comments inline

On Sat, Nov 13, 2010 at 3:37 PM, Alexandre Bertails <bertails@w3.org> wrote:

> On Sat, 2010-11-13 at 14:47 -0600, Juan Sequeda wrote:
> > I'd like to go through this thoroughly but I believe this looks a lot
> like:
> >
> >
> http://www.w3.org/2001/sw/rdb2rdf/wiki/Database-Instance-Only_and_Database-Instances-and-Schema_Mapping
> >
> > This was Marcelo and my proposal a longggg time ago.
>
> Yes, Eric made me read it a longggg time ago :-) But this is not the
> same approach (and I prefer the one you took in the merged document).
>
> In the merged spec, you say things like [[ Assume that r(a, b1, ...,
> bn) is a table with columns a, b1, ..., bn ... ]]. It's not clear if
> it means "I have a function from a relation r in RDB to a Datalog
> rule", or if you are giving an axiomatic description of the truth in a
> particular case.
>
> I understood it as an axiomatic description with universal
> quantification (the universe of discourse, which is also missing in
> your rules) because as there is no reason to keep two models of
> computation in the same spec, I assumed you were not competing with
> the mapping (which I recall is by definition a function) itself by
> proposing a new one. And if this was actually a function from RDB to
> Datalog, I would have expected to see the formal definition of a
> function with a clear domain and codomain.
>

there is no function from RDB to Datalog.
Datalog can be considered syntax for relational algebra. You can say the
same thing. IMO, I prefer reading datalog than relational algebra. So r is
the name of the table. i.e project attribute name from the table student

Ans(name) <- Student(_, name, _, _)


>
> So to be sure I was understanding your rules, I spontaneously started
> to annotate the variables and then, to get rid of the English (I
> always have a problem to consider descriptions in English as they
> escape from the formalism and hide the difficulty), I pushed the
> plain-text constraints into the rules, one after one. I found very
> pleasant to see that you actually use Higher Order Logic (the [[
> Assume that ]] were the clue but I did not get it right away). By
> putting more formalism into the rules, I really understood you were
> giving a nice semantical framework for the Direct Mapping, more than
> giving a way to compute it. The icing on the cake is that you never
> have to say *how* you compute an IRI, for example. You just have to
> say that it exists!
>


If you are combining the instances of the database AND schema elements
(Student is a table, id is a PK of the student table), then it becomes
higher order logic. Hence we had a schema+instance mapping. But Marcelo and
I came to the conclusion that it was too complicated. Hence we only wanted
Instance Mapping.

>
> The algebra tells you the "what" (the Abstract Models) and the "how"
> (the mapping functions), whereas your Axiomatic Semantics tells you
> the truth in the model.
>
> May I suggest the editors (Eric, that includes you) to make clear the
> relation between the Direct Mapping (the algebra) and its Axiomatic
> Semantics?
>

Yes we need to do that.


>
> Alexandre.
>
>
> >
> > Juan Sequeda
> > www.juansequeda.com
> >
> > On Nov 13, 2010, at 2:34 PM, Alexandre Bertails <bertails@w3.org> wrote:
> >
> > > On Fri, 2010-11-12 at 09:17 -0600, Juan Sequeda wrote:
> > >> Hi Everybody
> > >>
> > >>
> > >> Just to remind everybody that the new merged consolidated document can
> > >> be found here:
> > >>
> > >>
> > >> http://www.w3.org/2001/sw/rdb2rdf/directMapping/
> > >
> > > Looking at the roles of section section 6 Direct Mapping as Rules
> > > and 5 Direct Mapping Definition, I see an easy division between an
> > > axiomatic semantics and an algebra which implements/conforms to that
> > > semantics. As an example, section 6's generateColumnIRI declares a
> > > binding between a lists of column names and the corresponding RDF
> > > predicate IRI. You can view generateColumnIRI without explicit
> > > quantification (quoted from section 6):
> > >
> > >  generateColumnIRI(x, y, z): Given a table name x and a non-empty list
> of columns y, it generates the Column IRI z
> > >
> > > or with quantification:
> > >
> > >  ∀ r ∈ Table, ∀ columns ∈ [ Column ], ∀ iri ∈ IRI, generateColumnIRI(r,
> columns, iri) ← nonempty(columns)
> > >
> > > The generateColumnIRI rule is *realized* in Section 5's propertyIRI
> > > mapping from a list of columns to an IRI:
> > >
> > >  [32] propertyIRI(R, As) ≝ IRI(base + "/" + (join(',', UE(A.name)) ∣ A
> ∈ As ) "#" As.name)
> > >
> > > More formally, given an axiomatic semantics
> > > [[
> > >  ∀ r ∈ Table, ∀ iri ∈ IRI, generateTableIRI(r, iri)
> > >  ∀ r ∈ Table, ∀ columns ∈ [ Column ], ∀ iri ∈ IRI, generateColumnIRI(r,
> columns, iri) ← nonempty(columns)
> > >  ∀ r ∈ Table, ∀ columns ∈ [ Column ], ∀ values ∈ [ value ], ∀ iri ∈
> IRI,
> > >    generateRowIRI(r, columns, values, iri) ← nonempty(columns),
> nonempty(values)
> > >  ∀ r ∈ Table, ∀ values ∈ [ value ], ∀ bn ∈ BlankNode,
> generateRowBlankNode(r, values, bn) ← hasNoPrimaryKey(r)
> > >  ∀ r ∈ Table, ∀ column ∈ Column, ∀ value ∈ value, getValue(r, column,
> value)
> > >  ∀ r ∈ Table, ∀ c1 ∈ Column, ..., ∀ cn ∈ Column, ∀ x1 ∈ value, ..., ∀
> xn ∈ value,
> > >    getListValue(r, [c1, ..., cn], [x1, ..., xn]) ← getValue(r, c1, x1),
> ..., getValue(r, cn, xn)
> > >  (6.1.2 subsumes 6.1.1)
> > >  ∀ s ∈ Subject, ∀ o ∈ Object, ∀ r ∈ Table, ∀ c1 ∈ Column, ..., ∀ cm ∈
> Column, ∀ pk ∈ [ Column ], ∀ |pk| ∈ [ value ],
> > >    Triple(s, IRI("rdf:type"), o) ← r(c1, ..., cm),
> > >                                    isPrimaryKey(r, pk),
> > >                                    getListValue(r, pk, |pk|)
> > >                                    generateRowIRI(r, pk, |pk|, s),
> > >                                    generateTableIRI(r, o)
> > >  (6.1.3)
> > >  ∀ s ∈ Subject, ∀ o ∈ Object, ∀ r ∈ Table, ∀ c1 ∈ Column, ..., ∀ cn ∈
> Column,
> > >    Triple(s, IRI("rdf:type"), o) ← r(c1, ..., cn),
> > >                                    hasNoPrimaryKey(r),
> > >                                    generateRowBlankNode(r, [c1, ...,
> cn], s),
> > >                                    generateTableIRI(r, o)
> > >  (6.2.2 subsumes 6.2.1)
> > >    the 2 rules can be factorized as there is no reason to distinguish
> aj and bj (the conditions are the same)
> > >    the "or" implies a split of the rule
> > >  ∀ s ∈ Subject, ∀ p ∈ Predicate, ∀ xj ∈ value, ∀ r ∈ Table, ∀ c1 ∈
> Column, ..., ∀ cm ∈ Column,
> > >                                                             ∀ c ∈
> Column, ∀ x ∈ value,
> > >                                                             ∀ pk ∈ [
> Column ], ∀ |pk| ∈ [ value ],
> > >    Triple(s, p, x) ← r(c1, ..., cm),
> > >                      isPrimaryKey(r, pk),            // pk is the PK of
> r
> > >                      in(c, pk),                      // c is a Column
> in pk
> > >                      isNotForeignKey(r, [ c ]),      // c is not the
> only constituent of a foreign key of r
> > >                      getListValue(r, pk, |pk|)
> > >                      generateRowIRI(r, pk, |pk|, s),
> > >                      generateColumnIRI(r, [ c ], p),
> > >                      getValue(r, c, x)
> > >    Triple(s, p, x) ← r(c1, ..., cm),
> > >                      isPrimaryKey(r, pk),             // pk is the PK
> of r
> > >                      in(c, pk),                       // c is a Column
> in pk
> > >                      isForeignKey(r, [ c ]),          // c is the only
> constituent of a foreign key of r
> > >                      references(r, [ c ], r', ck),    // c references a
> candidate key ck in another table
> > >                      isPrimaryKey(r', ck),            // ck is the PK
> of this other table
> > >                      getListValue(r, pk, |pk|),
> > >                      generateRowIRI(r, pk, |pk|, s),
> > >                      generateColumnIRI(r, [ c ], p),
> > >                      getValue(r, c, x)
> > >  (6.2.3)
> > >  ∀ s ∈ Subject, ∀ p ∈ Predicate, ∀ r ∈ Table, ∀ c1 ∈ Column, ..., ∀ cm
> ∈ Column, ∀ c ∈ Column, ∀ x ∈ value,
> > >    Triple(s, p, x) ← r(c1, ..., cn),
> > >                      hasNoPrimaryKey(r),
> > >                      generateRowBlankNode(r, [c1, ..., cn], s),
> > >                      in(c, [c1, ..., cn]),
> > >                      generateColumnIRI(r, [ c ], p),
> > >                      getValue(r, c, x)
> > > ]]
> > >
> > > and an algebra:
> > >
> > > [[
> > > [1]    Database        ≝  { TableName → Table }
> > > [2]    Table           ≝  ( Header, [CandidateKey], CandidateKey?,
> ForeignKeys, Body )
> > > [3]    Header          ≝  { ColumnName → SQLDatatype }
> > > [4]    CandidateKey    ≝  [ ColumnName ]
> > > [5]    ForeignKeys     ≝  { [ColumnName] → ( Table, [ColumnName] ) }
> > > [6]    SQLDatatype     ≝  { INT | FLOAT | DATE | TIME | TIMESTAMP |
> CHAR | VARCHAR | STRING }
> > > [7]    Body            ≝  [ Tuple ]
> > > [8]    Tuple           ≝  { ColumnName → CellValue }
> > > [9]    CellValue       ≝  value | Null
> > >
> > > [10]   Graph           ≝  { Triple }
> > > [11]   Triple          ≝  ( Subject, Predicate, Object )
> > > [12]   Subject         ≝  IRI | BlankNode
> > > [13]   Predicate       ≝  IRI
> > > [14]   Object          ≝  IRI | BlankNode | Literal
> > > [15]   IRI             ≝  RDF URI-reference as subsequently restricted
> by SPARQL
> > > [16]   BlankNode       ≝  RDF blank node
> > > [17]   Literal         ≝  PlainLiteral | TypedLiteral
> > > [18]   PlainLiteral    ≝  (lexicalForm) | (lexicalForm, langageTag)
> > > [19]   TypedLiteral    ≝  (lexicalForm, IRI)
> > > ]]
> > >
> > > , one could show that the algebra fits the axiomatic semantics. In
> > > "Data Exchange: Semantics and Query Answering", Fagin et al. focused
> > > on separating the axiomatic semantics (which they call the "universal
> > > solution") from their data exchange algorithms.
> > >
> > > Alexandre.
> > >
> > >
> > >>
> > >>
> > >> Old versions of the document are:
> > >>
> > >>
> > >> http://www.w3.org/2001/sw/rdb2rdf/directGraph/
> > >> http://www.w3.org/2001/sw/rdb2rdf/directGraph/alt
> > >>
> > >>
> > >>
> > >>
> > >> Looking forward to your comments
> > >>
> > >>
> > >> Juan Sequeda
> > >> +1-575-SEQ-UEDA
> > >> www.juansequeda.com
> > >>
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
>
>
>
>
>
Received on Saturday, 13 November 2010 21:47:37 UTC