Detailed comments on new default mapping draft from Richard Cyganiak on 2010-11-02 (public-rdb2rdf-wg@w3.org from November 2010)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Tue, 2 Nov 2010 14:10:47 +0000
To: Marcelo Arenas <marcelo.arenas1@gmail.com>, Eric Prud'hommeaux <eric@w3.org>
Cc: RDB2RDF WG <public-rdb2rdf-wg@w3.org>
Message-Id: <CA849A83-D975-4243-AFC6-5F42DEEA54E6@cyganiak.de>
Marcelo, Eric,

I am commenting on Section 2 of
http://www.w3.org/2001/sw/rdb2rdf/directGraph/alt
$Id: alt.xml,v 1.2 2010/10/30 03:39:01 marenas Exp $

First of all, great work! Looks like we almost have an FPWD here.

Detailed comments are below. A lot of it is editorial, but there are  
some substantial comments too, as well as some pointers to oversights.  
I would appreciate if each comment could be a) addressed in the text,  
or b) reflected as an @@Issue in the text, or c) replied to in a  
response to this email, or d) turned into an Issue in the W3C tracker.


“Stem URI” — this should be called “base URI”, because that's a  
commonly understood term, and it enables the explanation of URI  
generation as resolution of a relative URI against a base URI.

The document should use SQL terminology throughout. Relation,  
attribute and tuple should be table, column and row, etc.

The approach in Section 2 defines URIs for columns and rows, but not  
for tables. This means one has to use hacks to do a SPARQL query for  
all records in a given table. The approach needs to define URIs for  
tables as well, and associate each row with the table it is from.

 From the current description, it is impossible to work out how URIs  
for rows with multi-column primary keys would look like. What order?  
What separator characters?

I am uncomfortable with the use of the dot character as a separator in  
generated URIs. The character typically used in URIs to indicate a  
hierarchical relationship is "/". The character typically used to  
indicate key-value pairs is "=".

I am uncomfortable with the use of "#_" at the end of row URIs. I  
cannot see any precedent for that, so I cannot call it good practice.  
It is also unnecessary because the URI identifies a row in a database  
table and never a person/address/organization or whatever other real- 
world object. Rows in database tables are information resources and  
thus there is no problem at all with identifying them using a plain  
fragment-less URI.

Special characters in table names, column names and PK values need to  
be handled in the URI generation.

2.2 is largely redundant as it only summarizes information that  
follows in more detail later. Thus the focus should be on giving a  
quick intro to the general idea, using simple language. The example is  
repeated twice for no reason.

2.2 says that the predicate of reference triples are “the Column IRI  
for the columns that constitute the foreign key.” That doesn't work  
for multi-column FKs.

2.2 says: “with an XML Schema datatype corresponding to the SQL  
datatype of that column”. That obviously needs to be spelled out.  
There should be an extra section to this, “Mapping SQL Datatypes to  
RDF Literals” or something like that. The section can be a placeholder  
for FPWD, but should exist.

The bullet points in 2.3.1 need refactoring. 90% of each bullet point  
is identical. It took me five minutes of careful parsing to work that  
out. This is lazy writing at the expense of clarity.

In the third bullet point in 2.3.1, “Literal triple” links to the  
wrong place. See, this is why you shouldn't copy-paste the same text  
three times!

The example from 2.2 is repeated a third time in 2.3.1 for no reason.

The verbose textual rendering of the schema is unnecessary and should  
be removed. It says nothing that cannot be seen from the visual  
representation. Rather use that space for writing the table definition  
in SQL. Same for other places in the document where table schemas are  
spelled out verbally.

I do not find the visual notation for unique keys and foreign keys  
particularly clear. How about simply listing them underneath the  
table? “Foreign key: addr -> Addresses.ID”

You write foreign keys as if they reference another *key*. I believe  
that doesn't reflect SQL. Foreign keys reference other *columns*.  
That's the mental model that a reader is going to have in their head,  
and that's how it should be presented in the spec.

Oh, 2.3.1 actually has an example that explains how multi-column PKs  
work. This should have been in the place where multi-column PKs were  
described.

The use of "_" as a separator between the column/value pairs in multi- 
column PK row URIs is a bad idea, because the underscore character is  
ubiquitous in table and column names. An obvious replacement would be  
";".

I found the last example in 2.3.1 confusing because it didn't generate  
a triple from the FK. The text before the example made it sound as if  
the following was a complete translation of the table. The text could  
be clearer about the fact that only the non-FK columns are translated.

The content of 2.3.1 actually doesn't really match its title. The  
title talks about “information in PKs”. What follows is not only about  
information in PK columns.

2.3.2: The rules for referencing tables without PKs state that the  
object is the target row's Tuple IRI. Earlier you said that such  
tables don't have Tuple IRIs but blank nodes.

The bullet points in 2.3.2 need refactoring to separate the common  
stuff from the stuff that's different between them.

2.3.2 explain the subjects and objects, but not the predicates of  
generated triples.

http://foo.example/DB/Department#Manager -- why is Manager uppercase?

I object to the representation of simple string literals as  
"Cambridge"^^xsd:string. This should simply be "Cambridge". They are  
equivalent under datatype semantics, so the simple form should be used.

18^^xsd:integer is not valid Turtle. This must either be  
"18"^^xsd:integer, or simply 18, which is just Turtle syntactic sugar  
for the former. I would highly prefer if the simple form was used  
throughout.

Again, please drop the concept of a stem URI and explain that the  
mapping uses relative URIs which are resolved against an environment- 
provided base URI. Instead of this:

<http://foo.example/DB/Addresses/ID.18#_> <http://foo.example/DB/Addresses#ID 
 > 18^^xsd:integer .
<http://foo.example/DB/Addresses/ID.18#_> <http://foo.example/DB/Addresses#city 
 > "Cambridge"^^xsd:string .
<http://foo.example/DB/Addresses/ID.18#_> <http://foo.example/DB/Addresses#state 
 > "MA"^^xsd:string .

I'd like to see this:

<Addresses/ID=18> <Addresses#ID> 18 .
<Addresses/ID=18> <Addresses#city> "Cambridge" .
<Addresses/ID=18> <Addresses#state> "MA" .

If you do it right, RDF can be simple ;-)


Again, great work, and I'm very happy to see this spec moving forward  
and like the direction it is taking.

Richard



>
> All the best,
>
> Marcelo
>
Received on Tuesday, 2 November 2010 14:11:24 UTC