Re: Detailed comments on new default mapping draft from Eric Prud'hommeaux on 2010-11-04 (public-rdb2rdf-wg@w3.org from November 2010)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Thu, 4 Nov 2010 01:44:29 -0400
To: Richard Cyganiak <richard@cyganiak.de>
Cc: Juan Sequeda <juanfederico@gmail.com>, Marcelo Arenas <marcelo.arenas1@gmail.com>, RDB2RDF WG <public-rdb2rdf-wg@w3.org>
Message-ID: <20101104054420.GA27150@w3.org>
Per Richard's suggestion, I've incorporated most of Richard's comments
into Rev 1.45 of http://www.w3.org/2001/sw/rdb2rdf/directGraph/ .


* Juan Sequeda <juanfederico@gmail.com> [2010-11-02 14:19-0500]
> Richard,
> 
> Great comments! Thanks a lot. I'm glad you took the time to read the whole
> thing. You even found typos, which means that you really took a magnifying
> glass with you.
> 
> Looking forward to other comments.
> 
> Marcelo and I just had a long meeting and we discussed each of your points.
> Comments are inline.
> 
> 
> On Tue, Nov 2, 2010 at 9:10 AM, Richard Cyganiak <richard@cyganiak.de>wrote:
> 
> > Marcelo, Eric,
> >
> > I am commenting on Section 2 of
> > http://www.w3.org/2001/sw/rdb2rdf/directGraph/alt
> > $Id: alt.xml,v 1.2 2010/10/30 03:39:01 marenas Exp $
> >
> > First of all, great work! Looks like we almost have an FPWD here.
> >
> > Detailed comments are below. A lot of it is editorial, but there are some
> > substantial comments too, as well as some pointers to oversights. I would
> > appreciate if each comment could be a) addressed in the text, or b)
> > reflected as an @@Issue in the text, or c) replied to in a response to this
> > email, or d) turned into an Issue in the W3C tracker.
> >
> 
> I'm commenting everything inline in this email. Some things would need to be
> turned into Issues in the W3C tracker (how to do this?)
> 
> >
> >
> >  Stem URI    this should be called  base URI , because that's a commonly
> > understood term, and it enables the explanation of URI generation as
> > resolution of a relative URI against a base URI.
> >
> 
> Ok. We will change this.

I prefer to keep stem URI distinct from the relative URI, at least for
FPWD. base, per 2396 <http://tools.ietf.org/html/rfc2396#section-1.4>
has a specific behavoir, for example, a relative URI <People/ID.7#_>
resolved against a base URI of <http://foo.example/DB> yields
<http://foo.example/People/ID.7#_>. If we remedy that with a leading
slash, </People/ID.7#_>, we get something resolved against the root.


> > The document should use SQL terminology throughout. Relation, attribute and
> > tuple should be table, column and row, etc.
> >
> 
> Ok. We will change this.

I think this is addressed §1-3. §4 uses a more traditional
terminology, but relates it to SQL terms with the following text:
[[
There are many models for databases in SQL literature; because the
Direct Mapping does not rely on column position, we use a model which
assumes a 1:1 correspondance between attribute (column name) and
value, i.e. a map.
Starting with a traditional model of a relational database we define a
Relation (a table) which has a name, a Header, Body and
primary/foreign key details.
The Body contains maps from attribute names to values and the Header
provides the datatypes to interpret those values.
]]

editorial suggestions?


> > The approach in Section 2 defines URIs for columns and rows, but not for
> > tables. This means one has to use hacks to do a SPARQL query for all records
> > in a given table. The approach needs to define URIs for tables as well, and
> > associate each row with the table it is from.
> >
> 
> If we understand correctly, we would need to create IRIs for Tables. Hence,
> there would be now three types of IRIs: Tuple, Columns and Tables. However,
> if we are to create Table IRIs, then we also need to create a new type of
> triples: Table Triples:
> 
> <TupleIRI, rdf:type, Table IRI>
> 
> Do we agree?

@@ will take a bit of work @@


> > From the current description, it is impossible to work out how URIs for
> > rows with multi-column primary keys would look like. What order? What
> > separator characters?
> >
> 
> The order is the same order of the columns in the table.  Marcelo and I have
> the following proposal for creating IRIs:
> 
> Table IRI
> 
> baseURI/table i.e baseURI/person
> 
> Column IRI
> 
> baseURI/table/column i.e baseURI/person/name
> 
> Multicolumn IRI
> 
> baseURI/table/column1#column2#... i.e. baseURI/person/fname#lname
> 
> Tuple IRI
> 
> baseURI/table/column1:value  i.e baseURI/person/id:12
> 
> Multicolumn Tuple IRI
> 
> baseURI/table/column1:value1#column2:value2#... i.e
> baseURI/person/fname:Juan#lname:Sequeda
> 
> 
> This is our proposal. However, we are not aware of the best practices for
> IRIs. I propose that we open an Issue on "how to generate Table, Tuple and
> Column IRIs"

http://www.w3.org/2001/sw/rdb2rdf/directGraph/#rules

now has text to specify the construction of IRIs in the predicate
position and the subject/object position. Editorial suggestions
welcome! (I'm not super-confident that there isn't a clearer way to
express this.)

http://www.w3.org/2001/sw/rdb2rdf/directGraph/#multi-key reinforces
that with:
[[
• The predicate for this key is formed from the stem and "deptName_deptCity",
  reflecting the order of the column names in the foreign key.
]]

> > I am uncomfortable with the use of the dot character as a separator in
> > generated URIs. The character typically used in URIs to indicate a
> > hierarchical relationship is "/". The character typically used to indicate
> > key-value pairs is "=".
> >
> 
> See previous comment. We should open Issue on creating IRIs and discuss this
> in group.

accepted s/=/./


> >
> > I am uncomfortable with the use of "#_" at the end of row URIs. I cannot
> > see any precedent for that, so I cannot call it good practice. It is also
> > unnecessary because the URI identifies a row in a database table and never a
> > person/address/organization or whatever other real-world object. Rows in
> > database tables are information resources and thus there is no problem at
> > all with identifying them using a plain fragment-less URI.
> >
> > This is what Eric had originally. So he can comment on his decision. Again,
> we should create an Issue on this.

I was following <http://www.w3.org/DesignIssues/RDB-RDF>, but decided to
shorten "personnel/employees/1234#item" to "personnel/employees/1234#_".
The choice of hash vs. slash is a real issue. I added an issue 
[[
hash-vs-slash: This edition of this document presumes slash
identifiers. LOD data identifiers tend to use slash, but that slightly
increases implementation burden and round trips.
]]
http://www.w3.org/2001/sw/rdb2rdf/directGraph/#hash-vs-slash


> > Special characters in table names, column names and PK values need to be
> > handled in the URI generation.
> >
> 
> Again... create an issue on this ( I sound like a broken record)

http://www.w3.org/2001/sw/rdb2rdf/directGraph/#rules states that table
names, attribute names and attribute lexical values are url-encoded
per http://www.w3.org/TR/wsdl#_http:urlEncoded .

> > 2.2 says:  with an XML Schema datatype corresponding to the SQL datatype of
> > that column . That obviously needs to be spelled out. There should be an
> > extra section to this,  Mapping SQL Datatypes to RDF Literals  or something
> > like that. The section can be a placeholder for FPWD, but should exist.
> >
> 
> Yes. There needs to be a table with the corresponding mapping

added [[
...the pairs of url-encoded column name '=' url-encoded column value,
separated by a '_'. The column value is the lexical per SQL99 [SQL99].
]]

> > I do not find the visual notation for unique keys and foreign keys
> > particularly clear. How about simply listing them underneath the table?
> >  Foreign key: addr -> Addresses.ID 
> >
> 
> Could we consider taking away the visual notation for keys, and just have
> the table with data. We would also put in the SQL DDL and I'm wondering if
> this would be enough?

I added an bit of a key before the first example. The "empty primary
key" example was, I believe, the worst of the lot. After struggling a
bit with a notation, I gave up and copied the XSD from
  https://dvcs.w3.org/hg/FeDeRate/file/060df0861705/directmapping/src/test/scala/DirectMappingTest.scala#l105
into
  http://www.w3.org/2001/sw/rdb2rdf/directGraph/#ref-no-pk


> >
> > You write foreign keys as if they reference another *key*. I believe that
> > doesn't reflect SQL. Foreign keys reference other *columns*. That's the
> > mental model that a reader is going to have in their head, and that's how it
> > should be presented in the spec.
> >
> 
> I agree. To make sure, what we currently have for example Address.PK, and we
> know that the PK of Address is ID, it should then be Address.ID (or
> something like that). Is that what you mean?

Actually, I disagree; foreign keys specifically reference candidate
keys in other tables. If the system does not enforce that, and the
data in foreign keys matches more than 1 row in the referenced table,
then we have have a pretty different graph to represent. My temptation
is to start out conservative, and if we have energy and mandate,
represent these cases which I believe are non-compliant.

[[
The columns in the referencing table must be the primary key or other
candidate key in the referenced table.
]] — http://en.wikipedia.org/wiki/Foreign_key ¶1


> > The use of "_" as a separator between the column/value pairs in
> > multi-column PK row URIs is a bad idea, because the underscore character is
> > ubiquitous in table and column names. An obvious replacement would be ";".
> >
> 
> 
> Again.. we need to create an Issue about generating IRIs :P

The prob here is that we don't want to step on either a valid fragment
identifier (for e.g. turtle) or xml local name (for RDF/XML). I've left
the "_" until we have a new idea.

Note that url-encoding the column names and lexical values protects us
from seing the "_"s in e.g. f_name Bob_Smith.


> > http://foo.example/DB/Department#Manager -- why is Manager uppercase?
> >
> 
> typo

ditto

> >
> > I object to the representation of simple string literals as
> > "Cambridge"^^xsd:string. This should simply be "Cambridge". They are
> > equivalent under datatype semantics, so the simple form should be used.
> >
> 
> We should create an issue on this: "Should a literal include xsd?" Should be
> discussed in group and come to a consensus.

http://www.w3.org/2001/sw/rdb2rdf/directGraph/#literaltriples
now says
[[
Per XML Datatypes for SQL Datatypes, string datatypes are expressed as
an RDF plain literal.
]]


> > Again, please drop the concept of a stem URI and explain that the mapping
> > uses relative URIs which are resolved against an environment-provided base
> > URI. Instead of this:
> >
> > <http://foo.example/DB/Addresses/ID.18#_> <
> > http://foo.example/DB/Addresses#ID> 18^^xsd:integer .
> > <http://foo.example/DB/Addresses/ID.18#_> <
> > http://foo.example/DB/Addresses#city> "Cambridge"^^xsd:string .
> > <http://foo.example/DB/Addresses/ID.18#_> <
> > http://foo.example/DB/Addresses#state> "MA"^^xsd:string .
> >
> > I'd like to see this:
> >
> > <Addresses/ID=18> <Addresses#ID> 18 .
> > <Addresses/ID=18> <Addresses#city> "Cambridge" .
> > <Addresses/ID=18> <Addresses#state> "MA" .

Hmm, it's possible that this might all be do-able with relative URIs.
But we'd better think about this carefully.
For now, I've used turtle's @base attribute.


> Do you mean that we should define a prefix:
> 
> @prefix base: <http://foo.example/DB/> .
> 
> and then everywhere have
> 
> <base:Addresses/ID=18> <base:Addresses#ID> 18 .
> <base:Addresses/ID=18> <base:Addresses#city> "Cambridge" .
> <base:Addresses/ID=18> <base:Addresses#state> "MA" .
> 
> 
> 
> 
> 
> > If you do it right, RDF can be simple ;-)
> >
> >
> :)
> 
> 
> >
> > Again, great work, and I'm very happy to see this spec moving forward and
> > like the direction it is taking.
> >
> 
> Thanks for you very insightful and direct comments.
> 
> Marcelo and I will be working on this in the next couple of days and let
> everybody know when we have an update. Please keep the comments coming!!!!!
> 
> 
> > Richard
> >
> >
> >
> >
> >> All the best,
> >>
> >> Marcelo
> >>
> >>
> >
> >
-- 
-ericP
Received on Thursday, 4 November 2010 05:45:15 UTC