- From: Juan Sequeda <juanfederico@gmail.com>
- Date: Sat, 13 Nov 2010 15:46:42 -0600
- To: Alexandre Bertails <bertails@w3.org>
- Cc: "public-rdb2rdf-wg@w3.org" <public-rdb2rdf-wg@w3.org>
- Message-ID: <AANLkTi=3dU+q=hS5kmTZOtk_K4r3=tTikomRRRsXUmU_@mail.gmail.com>
Alexandre you make good points which I need to read thoroughly but I don't want to do over the weekend ;) However, quick comments inline On Sat, Nov 13, 2010 at 3:37 PM, Alexandre Bertails <bertails@w3.org> wrote: > On Sat, 2010-11-13 at 14:47 -0600, Juan Sequeda wrote: > > I'd like to go through this thoroughly but I believe this looks a lot > like: > > > > > http://www.w3.org/2001/sw/rdb2rdf/wiki/Database-Instance-Only_and_Database-Instances-and-Schema_Mapping > > > > This was Marcelo and my proposal a longggg time ago. > > Yes, Eric made me read it a longggg time ago :-) But this is not the > same approach (and I prefer the one you took in the merged document). > > In the merged spec, you say things like [[ Assume that r(a, b1, ..., > bn) is a table with columns a, b1, ..., bn ... ]]. It's not clear if > it means "I have a function from a relation r in RDB to a Datalog > rule", or if you are giving an axiomatic description of the truth in a > particular case. > > I understood it as an axiomatic description with universal > quantification (the universe of discourse, which is also missing in > your rules) because as there is no reason to keep two models of > computation in the same spec, I assumed you were not competing with > the mapping (which I recall is by definition a function) itself by > proposing a new one. And if this was actually a function from RDB to > Datalog, I would have expected to see the formal definition of a > function with a clear domain and codomain. > there is no function from RDB to Datalog. Datalog can be considered syntax for relational algebra. You can say the same thing. IMO, I prefer reading datalog than relational algebra. So r is the name of the table. i.e project attribute name from the table student Ans(name) <- Student(_, name, _, _) > > So to be sure I was understanding your rules, I spontaneously started > to annotate the variables and then, to get rid of the English (I > always have a problem to consider descriptions in English as they > escape from the formalism and hide the difficulty), I pushed the > plain-text constraints into the rules, one after one. I found very > pleasant to see that you actually use Higher Order Logic (the [[ > Assume that ]] were the clue but I did not get it right away). By > putting more formalism into the rules, I really understood you were > giving a nice semantical framework for the Direct Mapping, more than > giving a way to compute it. The icing on the cake is that you never > have to say *how* you compute an IRI, for example. You just have to > say that it exists! > If you are combining the instances of the database AND schema elements (Student is a table, id is a PK of the student table), then it becomes higher order logic. Hence we had a schema+instance mapping. But Marcelo and I came to the conclusion that it was too complicated. Hence we only wanted Instance Mapping. > > The algebra tells you the "what" (the Abstract Models) and the "how" > (the mapping functions), whereas your Axiomatic Semantics tells you > the truth in the model. > > May I suggest the editors (Eric, that includes you) to make clear the > relation between the Direct Mapping (the algebra) and its Axiomatic > Semantics? > Yes we need to do that. > > Alexandre. > > > > > > Juan Sequeda > > www.juansequeda.com > > > > On Nov 13, 2010, at 2:34 PM, Alexandre Bertails <bertails@w3.org> wrote: > > > > > On Fri, 2010-11-12 at 09:17 -0600, Juan Sequeda wrote: > > >> Hi Everybody > > >> > > >> > > >> Just to remind everybody that the new merged consolidated document can > > >> be found here: > > >> > > >> > > >> http://www.w3.org/2001/sw/rdb2rdf/directMapping/ > > > > > > Looking at the roles of section section 6 Direct Mapping as Rules > > > and 5 Direct Mapping Definition, I see an easy division between an > > > axiomatic semantics and an algebra which implements/conforms to that > > > semantics. As an example, section 6's generateColumnIRI declares a > > > binding between a lists of column names and the corresponding RDF > > > predicate IRI. You can view generateColumnIRI without explicit > > > quantification (quoted from section 6): > > > > > > generateColumnIRI(x, y, z): Given a table name x and a non-empty list > of columns y, it generates the Column IRI z > > > > > > or with quantification: > > > > > > ∀ r ∈ Table, ∀ columns ∈ [ Column ], ∀ iri ∈ IRI, generateColumnIRI(r, > columns, iri) ← nonempty(columns) > > > > > > The generateColumnIRI rule is *realized* in Section 5's propertyIRI > > > mapping from a list of columns to an IRI: > > > > > > [32] propertyIRI(R, As) ≝ IRI(base + "/" + (join(',', UE(A.name)) ∣ A > ∈ As ) "#" As.name) > > > > > > More formally, given an axiomatic semantics > > > [[ > > > ∀ r ∈ Table, ∀ iri ∈ IRI, generateTableIRI(r, iri) > > > ∀ r ∈ Table, ∀ columns ∈ [ Column ], ∀ iri ∈ IRI, generateColumnIRI(r, > columns, iri) ← nonempty(columns) > > > ∀ r ∈ Table, ∀ columns ∈ [ Column ], ∀ values ∈ [ value ], ∀ iri ∈ > IRI, > > > generateRowIRI(r, columns, values, iri) ← nonempty(columns), > nonempty(values) > > > ∀ r ∈ Table, ∀ values ∈ [ value ], ∀ bn ∈ BlankNode, > generateRowBlankNode(r, values, bn) ← hasNoPrimaryKey(r) > > > ∀ r ∈ Table, ∀ column ∈ Column, ∀ value ∈ value, getValue(r, column, > value) > > > ∀ r ∈ Table, ∀ c1 ∈ Column, ..., ∀ cn ∈ Column, ∀ x1 ∈ value, ..., ∀ > xn ∈ value, > > > getListValue(r, [c1, ..., cn], [x1, ..., xn]) ← getValue(r, c1, x1), > ..., getValue(r, cn, xn) > > > (6.1.2 subsumes 6.1.1) > > > ∀ s ∈ Subject, ∀ o ∈ Object, ∀ r ∈ Table, ∀ c1 ∈ Column, ..., ∀ cm ∈ > Column, ∀ pk ∈ [ Column ], ∀ |pk| ∈ [ value ], > > > Triple(s, IRI("rdf:type"), o) ← r(c1, ..., cm), > > > isPrimaryKey(r, pk), > > > getListValue(r, pk, |pk|) > > > generateRowIRI(r, pk, |pk|, s), > > > generateTableIRI(r, o) > > > (6.1.3) > > > ∀ s ∈ Subject, ∀ o ∈ Object, ∀ r ∈ Table, ∀ c1 ∈ Column, ..., ∀ cn ∈ > Column, > > > Triple(s, IRI("rdf:type"), o) ← r(c1, ..., cn), > > > hasNoPrimaryKey(r), > > > generateRowBlankNode(r, [c1, ..., > cn], s), > > > generateTableIRI(r, o) > > > (6.2.2 subsumes 6.2.1) > > > the 2 rules can be factorized as there is no reason to distinguish > aj and bj (the conditions are the same) > > > the "or" implies a split of the rule > > > ∀ s ∈ Subject, ∀ p ∈ Predicate, ∀ xj ∈ value, ∀ r ∈ Table, ∀ c1 ∈ > Column, ..., ∀ cm ∈ Column, > > > ∀ c ∈ > Column, ∀ x ∈ value, > > > ∀ pk ∈ [ > Column ], ∀ |pk| ∈ [ value ], > > > Triple(s, p, x) ← r(c1, ..., cm), > > > isPrimaryKey(r, pk), // pk is the PK of > r > > > in(c, pk), // c is a Column > in pk > > > isNotForeignKey(r, [ c ]), // c is not the > only constituent of a foreign key of r > > > getListValue(r, pk, |pk|) > > > generateRowIRI(r, pk, |pk|, s), > > > generateColumnIRI(r, [ c ], p), > > > getValue(r, c, x) > > > Triple(s, p, x) ← r(c1, ..., cm), > > > isPrimaryKey(r, pk), // pk is the PK > of r > > > in(c, pk), // c is a Column > in pk > > > isForeignKey(r, [ c ]), // c is the only > constituent of a foreign key of r > > > references(r, [ c ], r', ck), // c references a > candidate key ck in another table > > > isPrimaryKey(r', ck), // ck is the PK > of this other table > > > getListValue(r, pk, |pk|), > > > generateRowIRI(r, pk, |pk|, s), > > > generateColumnIRI(r, [ c ], p), > > > getValue(r, c, x) > > > (6.2.3) > > > ∀ s ∈ Subject, ∀ p ∈ Predicate, ∀ r ∈ Table, ∀ c1 ∈ Column, ..., ∀ cm > ∈ Column, ∀ c ∈ Column, ∀ x ∈ value, > > > Triple(s, p, x) ← r(c1, ..., cn), > > > hasNoPrimaryKey(r), > > > generateRowBlankNode(r, [c1, ..., cn], s), > > > in(c, [c1, ..., cn]), > > > generateColumnIRI(r, [ c ], p), > > > getValue(r, c, x) > > > ]] > > > > > > and an algebra: > > > > > > [[ > > > [1] Database ≝ { TableName → Table } > > > [2] Table ≝ ( Header, [CandidateKey], CandidateKey?, > ForeignKeys, Body ) > > > [3] Header ≝ { ColumnName → SQLDatatype } > > > [4] CandidateKey ≝ [ ColumnName ] > > > [5] ForeignKeys ≝ { [ColumnName] → ( Table, [ColumnName] ) } > > > [6] SQLDatatype ≝ { INT | FLOAT | DATE | TIME | TIMESTAMP | > CHAR | VARCHAR | STRING } > > > [7] Body ≝ [ Tuple ] > > > [8] Tuple ≝ { ColumnName → CellValue } > > > [9] CellValue ≝ value | Null > > > > > > [10] Graph ≝ { Triple } > > > [11] Triple ≝ ( Subject, Predicate, Object ) > > > [12] Subject ≝ IRI | BlankNode > > > [13] Predicate ≝ IRI > > > [14] Object ≝ IRI | BlankNode | Literal > > > [15] IRI ≝ RDF URI-reference as subsequently restricted > by SPARQL > > > [16] BlankNode ≝ RDF blank node > > > [17] Literal ≝ PlainLiteral | TypedLiteral > > > [18] PlainLiteral ≝ (lexicalForm) | (lexicalForm, langageTag) > > > [19] TypedLiteral ≝ (lexicalForm, IRI) > > > ]] > > > > > > , one could show that the algebra fits the axiomatic semantics. In > > > "Data Exchange: Semantics and Query Answering", Fagin et al. focused > > > on separating the axiomatic semantics (which they call the "universal > > > solution") from their data exchange algorithms. > > > > > > Alexandre. > > > > > > > > >> > > >> > > >> Old versions of the document are: > > >> > > >> > > >> http://www.w3.org/2001/sw/rdb2rdf/directGraph/ > > >> http://www.w3.org/2001/sw/rdb2rdf/directGraph/alt > > >> > > >> > > >> > > >> > > >> Looking forward to your comments > > >> > > >> > > >> Juan Sequeda > > >> +1-575-SEQ-UEDA > > >> www.juansequeda.com > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
Received on Saturday, 13 November 2010 21:47:37 UTC