SQL terminology from Eric Prud'hommeaux on 2011-02-17 (public-rdb2rdf-wg@w3.org from February 2011)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Thu, 17 Feb 2011 10:16:16 -0500
To: Ivan Herman <ivan@w3.org>
Cc: Alexandre Bertails <bertails@w3.org>, Juan Sequeda <juanfederico@gmail.com>, RDB2RDF WG <public-rdb2rdf-wg@w3.org>
Message-ID: <20110217151614.GA3870@w3.org>
* Ivan Herman <ivan@w3.org> [2011-02-17 13:45+0100]
> Hi Alex,
> 
> On Feb 16, 2011, at 20:26 , Alexandre Bertails wrote:
> 
> [snip]
> 
> > I suggest you follow [2] instead of what you found in the current draft.
> > This is an updated version (far simpler and much more complete) and I'll
> > be happy to help you going through it.
> > 
> > Alexandre Bertails.
> > 
> > [2] http://www.w3.org/2011/02/16-DM-denotational-rdf-semantics
> 
> I went through the text and found some issues in reading the text. They may come from my own non-understanding, though... See them below. There some undefined things at the 'bottom', so to say, but these can be completed easily, like the definition of url_encode. But, overall, it looks fairly complete to me and  is certainly (in my view) publishing ready and worthwhile...
> 
> Of course, what I did until now is to read through the stuff and, I think, I understand it. Next I should take some specific cases and run a mental program to see if it does what I think it does:-). Actually, the nice thing would be if somebody did _another_ implementation (hm, possibly not in scala:-) to prove that the definition is complete. No, I will not do this, I do not think I would have the time now:-)
> 
> A specific technical question and this probably shows my lack of sql knowledge. The DM document talks about primary keys as possibly a multi-column primary keys (eg, section 2.2). Your description says
> 
> primaryKey : Table → {s:Set(CandidateKey) | size(s) ≤ 1}
> 
> ie, a specific primary key for a table cannot be multi-column, right? There seems to be a discrepancy.

It can be multi-column. Here's a quick intro to SQL terminology which should hopefully jibe with your intuition:

SQL speaks in terms of tables. The data in a table is in rows of data with a consistent datatype (e.g. 1st column is an INT, 2nd a CHAR(20)). A special value, "NULL", is in every datatype, and has special equivalence semantics. The table has a list of column names (e.g. "ID", "fname").

SQL tables can be formed by an explicit definition (base tables) or by the results of a query. Base tables must have distinct column names; "CREATE TABLE toy (foo INT, foo CHAR(20))" is not SQL. The results of a query may have multiple columns with the same name ("SELECT 1 AS foo, 1.0 AS foo") unless you are defining a view with that query. For example, "CREATE VIEW toy AS SELECT 1 AS foo, 1.0 AS foo" is also not SQL. Ultimately, base tables and views can be treated as rows of maps from column name to value.

Tables can have "UNIQUE [KEY]" constraints over sets of columns. This has the effect of policing your data and of making it very efficient to search on those columns. The relational model refers to these as "candidate keys", and any set of columns which includes a candidate key as a "super keys". SQLs "PRIMARY KEYS" are a subset of candidate keys with the additional constraint that the values must be non-null. The column values
  ( 1,  NULL)
  ( 1, "Bob")
are valid in a UNIQUE KEY, but is not valid in a PRIMARY KEY.

SQL has a mechanism for linking from a list of columns a tables to a list of columns in a potentially different table: "FOREIGN KEY (source columns) REFERENCES target (target columns)". This says that the values the source columns in any given row must equal the values of the target columns in exactly one row of the target table¹. These references form a graph with nodes being the rows in the tables.

¹ Noting that candidate keys may include NULLs, and I *think* that a foreign key constraint is satisfied if it contains all NULL values (to signify that there is no corresponding row in the target table), what link is implied by:
  CREATE TABLE target (x INT, y INT, foo INT, UNIQUE KEY (x, y));
  INSERT INTO target (x, y, foo) VALUES (NULL, NULL, 3);
  CREATE TABLE source (xr INT, yr INT, bar INT, FOREIGN KEY (xr, yr) REFERENCES target(x, y));
  INSERT INTO target (xr, yr, bar) VALUES (NULL, NULL, 7);


> Small comments
> ==============
> 
> There is a superfluous '(' in
> 
> value : ({r:Row} → {a:ColumnName  |  a ∈ r} → CellValue
> -------------
> 
> There is no definition for LexicalValue in 
> 
> CellValue ::= LexicalValue  |  NULL
> 
> and that also backfires in the definition of:
> 
> lexicals : {r:Row} → List({a:ColumnName  |  a ∈ r}) → List(LexicalValue)
> -------------
> 
> I understand the intention, but I am not sure I fully understand the mathematical formalism in, say, 
> 
> references : {t:Table} → {r:Row  |  r ∈ t} → Set(ForeignKey)
> 
> The formalism you use for accessor functions like
> 
> bla : {a:Set1} → {b:Set2}
> 
> seems to indicate that this is a function so that bla(a) yields b. But I am not sure what the value of references(t) is... Shouldn't it be
> 
> references : {t:Table} → {r:Row | r ∈ t} → { f:Set(ForeignKey) | (cn,v) ∈ r | (cn,t1,c1) ∈ r}
> 
> or something like that?
> 
> (The same issue with 'lexicals' and 'scalars'.)
> 
> ---------------
> 
> It is a notational thing, but the usage of '+' is a bit strange to me in ⟦r⟧ row φ. Instead of +, wouldn't it be cleaner to use a set union for
> 
> {(s, rdf:type, ue(tablename(table(r))))} 
> 
> ---------------
> 
> In the definition of ⟦r, fk⟧ ref φ you refer to the function 'dereference', but that is not defined anywhere else. I did not spend time to try to reproduce the definition...
> 
> IN the same function you also refer to 
> 
> let p = ⟦table(r), pk⟧col
> 
> but what is 'pk' here?
> 
> ----
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> PGP Key: http://www.ivan-herman.net/pgpkey.html
> FOAF: http://www.ivan-herman.net/foaf.rdf
> 
> 
> 
> 
> 



-- 
-ericP
Received on Thursday, 17 February 2011 15:16:50 UTC