Re: New merged consolidated Direct Mapping version from Alexandre Bertails on 2010-11-13 (public-rdb2rdf-wg@w3.org from November 2010)

From: Alexandre Bertails <bertails@w3.org>
Date: Sat, 13 Nov 2010 16:37:25 -0500
To: Juan Sequeda <juanfederico@gmail.com>
Cc: "public-rdb2rdf-wg@w3.org" <public-rdb2rdf-wg@w3.org>
Message-ID: <1289684245.9296.2.camel@simplet>
On Sat, 2010-11-13 at 14:47 -0600, Juan Sequeda wrote:
> I'd like to go through this thoroughly but I believe this looks a lot like: 
> 
> http://www.w3.org/2001/sw/rdb2rdf/wiki/Database-Instance-Only_and_Database-Instances-and-Schema_Mapping
> 
> This was Marcelo and my proposal a longggg time ago.

Yes, Eric made me read it a longggg time ago :-) But this is not the
same approach (and I prefer the one you took in the merged document).

In the merged spec, you say things like [[ Assume that r(a, b1, ...,
bn) is a table with columns a, b1, ..., bn ... ]]. It's not clear if
it means "I have a function from a relation r in RDB to a Datalog
rule", or if you are giving an axiomatic description of the truth in a
particular case.

I understood it as an axiomatic description with universal
quantification (the universe of discourse, which is also missing in
your rules) because as there is no reason to keep two models of
computation in the same spec, I assumed you were not competing with
the mapping (which I recall is by definition a function) itself by
proposing a new one. And if this was actually a function from RDB to
Datalog, I would have expected to see the formal definition of a
function with a clear domain and codomain.

So to be sure I was understanding your rules, I spontaneously started
to annotate the variables and then, to get rid of the English (I
always have a problem to consider descriptions in English as they
escape from the formalism and hide the difficulty), I pushed the
plain-text constraints into the rules, one after one. I found very
pleasant to see that you actually use Higher Order Logic (the [[
Assume that ]] were the clue but I did not get it right away). By
putting more formalism into the rules, I really understood you were
giving a nice semantical framework for the Direct Mapping, more than
giving a way to compute it. The icing on the cake is that you never
have to say *how* you compute an IRI, for example. You just have to
say that it exists!

The algebra tells you the "what" (the Abstract Models) and the "how"
(the mapping functions), whereas your Axiomatic Semantics tells you
the truth in the model.

May I suggest the editors (Eric, that includes you) to make clear the
relation between the Direct Mapping (the algebra) and its Axiomatic
Semantics?

Alexandre.


> 
> Juan Sequeda
> www.juansequeda.com
> 
> On Nov 13, 2010, at 2:34 PM, Alexandre Bertails <bertails@w3.org> wrote:
> 
> > On Fri, 2010-11-12 at 09:17 -0600, Juan Sequeda wrote:
> >> Hi Everybody
> >> 
> >> 
> >> Just to remind everybody that the new merged consolidated document can
> >> be found here:
> >> 
> >> 
> >> http://www.w3.org/2001/sw/rdb2rdf/directMapping/
> > 
> > Looking at the roles of section section 6 Direct Mapping as Rules
> > and 5 Direct Mapping Definition, I see an easy division between an
> > axiomatic semantics and an algebra which implements/conforms to that
> > semantics. As an example, section 6's generateColumnIRI declares a
> > binding between a lists of column names and the corresponding RDF
> > predicate IRI. You can view generateColumnIRI without explicit
> > quantification (quoted from section 6):
> > 
> >  generateColumnIRI(x, y, z): Given a table name x and a non-empty list of columns y, it generates the Column IRI z
> > 
> > or with quantification:
> > 
> >  ∀ r ∈ Table, ∀ columns ∈ [ Column ], ∀ iri ∈ IRI, generateColumnIRI(r, columns, iri) ← nonempty(columns)
> > 
> > The generateColumnIRI rule is *realized* in Section 5's propertyIRI
> > mapping from a list of columns to an IRI:
> > 
> >  [32] propertyIRI(R, As) ≝ IRI(base + "/" + (join(',', UE(A.name)) ∣ A ∈ As ) "#" As.name)
> > 
> > More formally, given an axiomatic semantics
> > [[
> >  ∀ r ∈ Table, ∀ iri ∈ IRI, generateTableIRI(r, iri)
> >  ∀ r ∈ Table, ∀ columns ∈ [ Column ], ∀ iri ∈ IRI, generateColumnIRI(r, columns, iri) ← nonempty(columns)
> >  ∀ r ∈ Table, ∀ columns ∈ [ Column ], ∀ values ∈ [ value ], ∀ iri ∈ IRI,
> >    generateRowIRI(r, columns, values, iri) ← nonempty(columns), nonempty(values)
> >  ∀ r ∈ Table, ∀ values ∈ [ value ], ∀ bn ∈ BlankNode, generateRowBlankNode(r, values, bn) ← hasNoPrimaryKey(r)
> >  ∀ r ∈ Table, ∀ column ∈ Column, ∀ value ∈ value, getValue(r, column, value)
> >  ∀ r ∈ Table, ∀ c1 ∈ Column, ..., ∀ cn ∈ Column, ∀ x1 ∈ value, ..., ∀ xn ∈ value,
> >    getListValue(r, [c1, ..., cn], [x1, ..., xn]) ← getValue(r, c1, x1), ..., getValue(r, cn, xn)
> >  (6.1.2 subsumes 6.1.1)
> >  ∀ s ∈ Subject, ∀ o ∈ Object, ∀ r ∈ Table, ∀ c1 ∈ Column, ..., ∀ cm ∈ Column, ∀ pk ∈ [ Column ], ∀ |pk| ∈ [ value ],
> >    Triple(s, IRI("rdf:type"), o) ← r(c1, ..., cm),
> >                                    isPrimaryKey(r, pk),
> >                                    getListValue(r, pk, |pk|)
> >                                    generateRowIRI(r, pk, |pk|, s),
> >                                    generateTableIRI(r, o)
> >  (6.1.3)
> >  ∀ s ∈ Subject, ∀ o ∈ Object, ∀ r ∈ Table, ∀ c1 ∈ Column, ..., ∀ cn ∈ Column,
> >    Triple(s, IRI("rdf:type"), o) ← r(c1, ..., cn),
> >                                    hasNoPrimaryKey(r),
> >                                    generateRowBlankNode(r, [c1, ..., cn], s),
> >                                    generateTableIRI(r, o)
> >  (6.2.2 subsumes 6.2.1)
> >    the 2 rules can be factorized as there is no reason to distinguish aj and bj (the conditions are the same)
> >    the "or" implies a split of the rule
> >  ∀ s ∈ Subject, ∀ p ∈ Predicate, ∀ xj ∈ value, ∀ r ∈ Table, ∀ c1 ∈ Column, ..., ∀ cm ∈ Column,
> >                                                             ∀ c ∈ Column, ∀ x ∈ value,
> >                                                             ∀ pk ∈ [ Column ], ∀ |pk| ∈ [ value ],
> >    Triple(s, p, x) ← r(c1, ..., cm),
> >                      isPrimaryKey(r, pk),            // pk is the PK of r
> >                      in(c, pk),                      // c is a Column in pk
> >                      isNotForeignKey(r, [ c ]),      // c is not the only constituent of a foreign key of r
> >                      getListValue(r, pk, |pk|)
> >                      generateRowIRI(r, pk, |pk|, s),
> >                      generateColumnIRI(r, [ c ], p),
> >                      getValue(r, c, x)
> >    Triple(s, p, x) ← r(c1, ..., cm),
> >                      isPrimaryKey(r, pk),             // pk is the PK of r
> >                      in(c, pk),                       // c is a Column in pk
> >                      isForeignKey(r, [ c ]),          // c is the only constituent of a foreign key of r
> >                      references(r, [ c ], r', ck),    // c references a candidate key ck in another table
> >                      isPrimaryKey(r', ck),            // ck is the PK of this other table
> >                      getListValue(r, pk, |pk|),
> >                      generateRowIRI(r, pk, |pk|, s),
> >                      generateColumnIRI(r, [ c ], p),
> >                      getValue(r, c, x)
> >  (6.2.3)
> >  ∀ s ∈ Subject, ∀ p ∈ Predicate, ∀ r ∈ Table, ∀ c1 ∈ Column, ..., ∀ cm ∈ Column, ∀ c ∈ Column, ∀ x ∈ value,
> >    Triple(s, p, x) ← r(c1, ..., cn),
> >                      hasNoPrimaryKey(r),
> >                      generateRowBlankNode(r, [c1, ..., cn], s),
> >                      in(c, [c1, ..., cn]),
> >                      generateColumnIRI(r, [ c ], p),
> >                      getValue(r, c, x)
> > ]]
> > 
> > and an algebra:
> > 
> > [[
> > [1]    Database        ≝  { TableName → Table }
> > [2]    Table           ≝  ( Header, [CandidateKey], CandidateKey?, ForeignKeys, Body )
> > [3]    Header          ≝  { ColumnName → SQLDatatype }
> > [4]    CandidateKey    ≝  [ ColumnName ]
> > [5]    ForeignKeys     ≝  { [ColumnName] → ( Table, [ColumnName] ) }
> > [6]    SQLDatatype     ≝  { INT | FLOAT | DATE | TIME | TIMESTAMP | CHAR | VARCHAR | STRING }
> > [7]    Body            ≝  [ Tuple ]
> > [8]    Tuple           ≝  { ColumnName → CellValue }
> > [9]    CellValue       ≝  value | Null
> > 
> > [10]   Graph           ≝  { Triple }
> > [11]   Triple          ≝  ( Subject, Predicate, Object )
> > [12]   Subject         ≝  IRI | BlankNode
> > [13]   Predicate       ≝  IRI
> > [14]   Object          ≝  IRI | BlankNode | Literal
> > [15]   IRI             ≝  RDF URI-reference as subsequently restricted by SPARQL
> > [16]   BlankNode       ≝  RDF blank node
> > [17]   Literal         ≝  PlainLiteral | TypedLiteral
> > [18]   PlainLiteral    ≝  (lexicalForm) | (lexicalForm, langageTag)
> > [19]   TypedLiteral    ≝  (lexicalForm, IRI)
> > ]]
> > 
> > , one could show that the algebra fits the axiomatic semantics. In
> > "Data Exchange: Semantics and Query Answering", Fagin et al. focused
> > on separating the axiomatic semantics (which they call the "universal
> > solution") from their data exchange algorithms.
> > 
> > Alexandre.
> > 
> > 
> >> 
> >> 
> >> Old versions of the document are:
> >> 
> >> 
> >> http://www.w3.org/2001/sw/rdb2rdf/directGraph/
> >> http://www.w3.org/2001/sw/rdb2rdf/directGraph/alt
> >> 
> >> 
> >> 
> >> 
> >> Looking forward to your comments
> >> 
> >> 
> >> Juan Sequeda
> >> +1-575-SEQ-UEDA
> >> www.juansequeda.com
> >> 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
>
Received on Saturday, 13 November 2010 21:37:33 UTC