Re: New merged consolidated Direct Mapping version from Alexandre Bertails on 2010-11-16 (public-rdb2rdf-wg@w3.org from November 2010)

From: Alexandre Bertails <bertails@w3.org>
Date: Tue, 16 Nov 2010 11:06:41 -0500
To: Marcelo Arenas <marcelo.arenas1@gmail.com>
Cc: Juan Sequeda <juanfederico@gmail.com>, "public-rdb2rdf-wg@w3.org" <public-rdb2rdf-wg@w3.org>
Message-ID: <1289923601.3582.178.camel@simplet>
On Tue, 2010-11-16 at 06:19 -0300, Marcelo Arenas wrote:
> Hi Alexandre,
> 
> Thank you very much for you comments.

You're welcome, I'm just sharing my feedbacks.

> 
> On Mon, Nov 15, 2010 at 8:28 PM, Alexandre Bertails <bertails@w3.org> wrote:
> > Sorry for not answering earlier, RDB2RDF is not my real job at W3C :-)
> >
> > On Sat, 2010-11-13 at 15:46 -0600, Juan Sequeda wrote:
> >> Alexandre
> >>
> >>
> >> you make good points which I need to read thoroughly but I don't want
> >> to do over the weekend ;)
> >>
> >>
> >> However, quick comments inline
> >>
> >> On Sat, Nov 13, 2010 at 3:37 PM, Alexandre Bertails <bertails@w3.org>
> >> wrote:
> >>         On Sat, 2010-11-13 at 14:47 -0600, Juan Sequeda wrote:
> >>         > I'd like to go through this thoroughly but I believe this
> >>         looks a lot like:
> >>         >
> >>         >
> >>         http://www.w3.org/2001/sw/rdb2rdf/wiki/Database-Instance-Only_and_Database-Instances-and-Schema_Mapping
> >>         >
> >>         > This was Marcelo and my proposal a longggg time ago.
> >>
> >>
> >>         Yes, Eric made me read it a longggg time ago :-) But this is
> >>         not the
> >>         same approach (and I prefer the one you took in the merged
> >>         document).
> >>
> >>         In the merged spec, you say things like [[ Assume that r(a,
> >>         b1, ...,
> >>         bn) is a table with columns a, b1, ..., bn ... ]]. It's not
> >>         clear if
> >>         it means "I have a function from a relation r in RDB to a
> >>         Datalog
> >>         rule", or if you are giving an axiomatic description of the
> >>         truth in a
> >>         particular case.
> >>
> >>         I understood it as an axiomatic description with universal
> >>         quantification (the universe of discourse, which is also
> >>         missing in
> >>         your rules) because as there is no reason to keep two models
> >>         of
> >>         computation in the same spec, I assumed you were not competing
> >>         with
> >>         the mapping (which I recall is by definition a function)
> >>         itself by
> >>         proposing a new one. And if this was actually a function from
> >>         RDB to
> >>         Datalog, I would have expected to see the formal definition of
> >>         a
> >>         function with a clear domain and codomain.
> >>
> >>
> >> there is no function from RDB to Datalog.
> >> Datalog can be considered syntax for relational algebra. You can say
> >> the same thing. IMO, I prefer reading datalog than relational algebra.
> >> So r is the name of the table. i.e project attribute name from the
> >> table student
> >>
> >>
> >> Ans(name) <- Student(_, name, _, _)
> >
> > In Datalog, you cannot reason on the relation r itself. So you need
> > something external to go from the relation r to the relation name "r".
> > Said differently, as long as you'll put an "r" in a Predicate, this is
> > not FOL.
> >
> > How do you make the distinction between the relation and its name? Eric
> > showed me a scheme but he called it "perverse" :-) And he still needs
> > higher-order.
> >
> >>         So to be sure I was understanding your rules, I spontaneously
> >>         started
> >>         to annotate the variables and then, to get rid of the English
> >>         (I
> >>         always have a problem to consider descriptions in English as
> >>         they
> >>         escape from the formalism and hide the difficulty), I pushed
> >>         the
> >>         plain-text constraints into the rules, one after one. I found
> >>         very
> >>         pleasant to see that you actually use Higher Order Logic (the
> >>         [[
> >>         Assume that ]] were the clue but I did not get it right away).
> >>         By
> >>         putting more formalism into the rules, I really understood you
> >>         were
> >>         giving a nice semantical framework for the Direct Mapping,
> >>         more than
> >>         giving a way to compute it. The icing on the cake is that you
> >>         never
> >>         have to say *how* you compute an IRI, for example. You just
> >>         have to
> >>         say that it exists!
> >>
> >>
> >>
> >>
> >> If you are combining the instances of the database AND schema elements
> >> (Student is a table, id is a PK of the student table), then it becomes
> >> higher order logic. Hence we had a schema+instance mapping. But
> >> Marcelo and I came to the conclusion that it was too complicated.
> >> Hence we only wanted Instance Mapping.
> >
> > Yes I agree it's complicated.
> >
> > Mixing schema and data is *a* way to get higher-order. But as long as
> > you have the table name outside of a predicate position, this is gonna
> > be higher-order.
> >
> >
> >>
> >>         The algebra tells you the "what" (the Abstract Models) and the
> >>         "how"
> >>         (the mapping functions), whereas your Axiomatic Semantics
> >>         tells you
> >>         the truth in the model.
> >>
> >>         May I suggest the editors (Eric, that includes you) to make
> >>         clear the
> >>         relation between the Direct Mapping (the algebra) and its
> >>         Axiomatic
> >>         Semantics?
> >>
> >>
> >> Yes we need to do that.
> >
> > Editorial proposal to put somewhere in the introduction:
> > [[
> > The Direct Mapping is an algebra defining the mapping from RDB to RDF,
> > expressed in Type Theory. The Axiomatic Semantics defines the set of
> > laws which the Direct Mapping must respect.
> > ]]
> 
> I don't agree with including this paragraph in the introduction. We
> want people to read the document, so I like the idea of having
> alternative formalizations of the direct mapping, each one with their
> own perspective. One of them is Eric's proposal, for which you
> paragraph is appropriate. The other one is based on Datalog, which is
> a familiar notation for database people, but which follows a different
> approach.

I'm not sure who "database people" are. If you mean "people in research"
and/or "people speaking Datalog", it means we have to change the
targeted audience again and the terminology we use.

I believe that the vast majority of people who will be interested in
RDB2RDF won't know anything about Datalog -- and its semantics -- and
are not interested in it. But they all have a pretty good understanding
of functions (maps).

> Actually, I would like  to point out here that the way we are
> representing the direct mapping in Datalog is pretty standard in
> database theory.

I agree with the general statement, especially for relational database
theory.

> In fact, I have the impression that some of your
> concerns about this representation are coming from the fact that you
> are not familiar with the language.

I thought Datalog was just first-order predicate calculus, relying on
sets. Don't worry about that, I've manipulated much more difficult
formalisms in the past. And I've read the articles you shared with us.

You must understand we are facing some real-world situations here. For
example, RDB implementations work on top of multisets, not sets. And
Datalog is not as accessible for non-researchers than a simple
well-defined function.

>  Just as an example, we are not
> missing the universal quantifiers in our rules. All the variables in a
> Datalog rule are universally quantified, so the universal quantifiers
> are omitted (Datalog is a fragment of first-order logic that uses some
> non-first-order notation).

Have you read my rewritten rules? Do you think they are wrong or that
they don't comply with yours? They just show this is not Datalog as you
implicitly use predicates in variable position.

Before answering to that, please specify the domain of your predicates
(which you are supposed to do in Datalog anyway). For example, I still
don't know where you encode types for values.

Alexandre.

> 
> Cheers,
> 
> Marcelo
>
Received on Tuesday, 16 November 2010 16:06:40 UTC