Re: Minutes for the 2010-10-21 RDB2RDF meeting + semantics discussion

* Alexandre Bertails <bertails@w3.org> [2010-10-21 16:32-0400]
> Hello guys,
> 
> here are the minutes for "RDB2RDF - Formal mapping - semantics" staged
> at [1]. Sorry for all the @@ but I had troubles to associate the voices
> with the right people.
> 
> == Quick Summary ==
> 
> Here is a quick summary:
> * importance of the 7 use-cases which should have their own section
> * it's ok to have some examples covering several use-cases (if it's
> said)
> * consensus around the SQL terminology instead of the Relational Algebra
> one, because of the intended public

Noting that the greater WG still has to agree with our task force
decisions, I believe that we have decided that the documents we
produce and the suggestions we make to the WG will use SQL
terminology, e.g. "Relations" will be called "Tables".

I believe we also decided what subset of tables we care about,
specifically, we do not care to either define an RDF representation or
SPARQL access to tables which have more than one column with the same
name, and that tuples (henceforth "rows") could be treated as a
mapping from name to value, and the schema provides a mapping from
name to a type for that value.

(In case folks are suddenly uneasy seeing this in print, I can back it
up with some rationale.) Apologies if I seem very emphatic about model;
I just feel like it's important for us to have a metric for which e.g.
mapping rules will meet our consentually-derived needs.


> * Marcelo and ?? proposed to update EricP's documents based on the
> previous points. Should be done next week.
> * discussion about what "semantics" means in the context of RDB2RDF, no
> consensus. See below for more information.
> * to answer the previous point, Marcello will send an email with the
> right informations (a digitalized book) and will give some context
> 
> == Formal Mapping ==
> 
> I did not scribe while I was speaking, at the end of the telcon. Here
> are my notes about *what* is the mapping:
> 
> 1. simple case: it's only a Default Mapping, so it's a function
> [ RDB2RDF : RDB → RDF ]
> 2. difficult case: it's a function interpreting a rule language (R2ML),
> so it's a function [ RDB2RDF : (RDB×R2ML) → RDF ]. In this context, the
> Default Mapping is just a particular case where you use the empty value
> as an inhabitant for R2ML.
> 
> In both cases, you need to start with the simple case, the Default
> Mapping, then you can build something more complicated on top of it.
> 
> == Semantics ==
> 
> I'm looking forward for Marcello's document about his definition. And I
> hope he will also comment what some others think.
> 
> [[
> <PatH> We could (not on IRC) draw this as a 'square' of functors which
> we want to commute.
> ]]
> 
> Here is it is, in the case of the Default Mapping.
> 
> Typed multi-sets   -- ?? -→   Set of triples
>       ↑                            ↑
> RDB semantics                 RDF semantics
>       |                            |
>      RDB       -- mapping -→      RDF
> 
> I don't have a name for ??. We just have to prove that it is injective
> (I'm not sure we want/need a bijection at that point). The cool term to
> reuse during your next dinner is "semantics preservation". Please read
> [2] if you don't understand the concept.
> 
> The "semantics of RDB2RDF" makes sense *only* relatively to the
> formalism used to express it. That means we need a formalism which:
> * can encode RDB *and* RDF [3]
> * has a formal semantics
> Then the semantics of "mapping" is just its interpretation in the
> semantics of the formalism. "RDB2RDF" itself is *only* a definition.
> 
> I've also explained briefly why the semantics question was important.
> One the goals -- for the future -- of RDB2RDF is to allow people to
> request an RDB dataset with SPARQL. That means you want the following:
> 
> [[
> Let's call SPARQL2RDF the higher-order function that maps a SPARQL query
> to a SQL query: [ SPARQL2RDF : SPARQL → SQL ].
> 
> Let's call RDB2RDF the function that maps RDB to RDF: [ RDB2RDF : RDB →
> RDF ].
> 
> Let's "=" be the Graph Equivalence defined in [4].
> 
> Theorem:
> ∀ rdb-dataset ∈ RDB, ∀ sparql-query ∈ SPARQL,
> (SPARQL2SQL(sparql-query))(rdb-dataset) =
> sparql-query(RDB2RDF(rdb-dataset))
> ]]
> 
> Or in plain English:
> 
> [[
> Given a SPARQL-to-SQL mapping to access an RDB dataset, the semantics of
> the translated SPARQL query executed against this particular RDB dataset
> should be equivalent to the same SPARQL query executed against the same
> RDB dataset seen through the RDB2RDF mapping.
> ]]
> 
> If you have that, you can say that RDB2RDF and SPARQL2SQL are consistent
> with each other and that you don't loose any data.
> 
> No more, no less.
> 
> Alexandre.
> 
> [1] http://www.w3.org/2010/10/21-rdb2rdf-minutes.html
> [2]
> http://en.wikipedia.org/wiki/Semantics_encoding#Preservation_of_equivalences
> [3]
> http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-Graph-syntax
> [4]
> http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-graph-equality
> 
> 

-- 
-ericP

Received on Saturday, 23 October 2010 14:45:08 UTC