Re: Detailed comments on new default mapping draft from Alexandre Bertails on 2010-11-04 (public-rdb2rdf-wg@w3.org from November 2010)

From: Alexandre Bertails <bertails@w3.org>
Date: Thu, 04 Nov 2010 09:57:29 -0400
To: Eric Prud'hommeaux <eric@w3.org>
Cc: Richard Cyganiak <richard@cyganiak.de>, Juan Sequeda <juanfederico@gmail.com>, Marcelo Arenas <marcelo.arenas1@gmail.com>, RDB2RDF WG <public-rdb2rdf-wg@w3.org>
Message-ID: <1288879049.1204.99.camel@simplet>
Here are my comments, based on Eric's document (I've not had a look on
Juan/Marcelo's one).

First of all, I propose we don't use the term "transformation". The
title says it's a mapping, so can we please with "mapping" instead of
"transformation"? And "transformation" makes people to think it's
actually transformed, which is not the case.

[ snip ]
> > >  Stem URI    this should be called  base URI , because that's a commonly
> > > understood term, and it enables the explanation of URI generation as
> > > resolution of a relative URI against a base URI.
> > >
> > 
> > Ok. We will change this.
> 
> I prefer to keep stem URI distinct from the relative URI, at least for
> FPWD. base, per 2396 <http://tools.ietf.org/html/rfc2396#section-1.4>
> has a specific behavoir, for example, a relative URI <People/ID.7#_>
> resolved against a base URI of <http://foo.example/DB> yields
> <http://foo.example/People/ID.7#_>. If we remedy that with a leading
> slash, </People/ID.7#_>, we get something resolved against the root.

+1

> > > The document should use SQL terminology throughout. Relation, attribute and
> > > tuple should be table, column and row, etc.
> > >
> > 
> > Ok. We will change this.
> 
> I think this is addressed §1-3. §4 uses a more traditional
> terminology, but relates it to SQL terms with the following text:
> [[
> There are many models for databases in SQL literature; because the
> Direct Mapping does not rely on column position, we use a model which
> assumes a 1:1 correspondance between attribute (column name) and
> value, i.e. a map.
> Starting with a traditional model of a relational database we define a
> Relation (a table) which has a name, a Header, Body and
> primary/foreign key details.
> The Body contains maps from attribute names to values and the Header
> provides the datatypes to interpret those values.
> ]]
> 
> editorial suggestions?

Here is mine:
 s/i.e. a map/i.e. a map from Attribute to Values/
Or just get rid of it, as you describe it in the next phrase.

Question: in the formal definition you still use Relational terminology,
is there a reason for that?

> > > The approach in Section 2 defines URIs for columns and rows, but not for
> > > tables. This means one has to use hacks to do a SPARQL query for all records
> > > in a given table. The approach needs to define URIs for tables as well, and
> > > associate each row with the table it is from.
> > >
> > 
> > If we understand correctly, we would need to create IRIs for Tables. Hence,
> > there would be now three types of IRIs: Tuple, Columns and Tables. However,
> > if we are to create Table IRIs, then we also need to create a new type of
> > triples: Table Triples:
> > 
> > <TupleIRI, rdf:type, Table IRI>
> > 
> > Do we agree?
> 
> @@ will take a bit of work @@
> 
> 
> > > From the current description, it is impossible to work out how URIs for
> > > rows with multi-column primary keys would look like. What order? What
> > > separator characters?
> > >
> > 
> > The order is the same order of the columns in the table.  Marcelo and I have
> > the following proposal for creating IRIs:
> > 
> > Table IRI
> > 
> > baseURI/table i.e baseURI/person
> > 
> > Column IRI
> > 
> > baseURI/table/column i.e baseURI/person/name
> > 
> > Multicolumn IRI
> > 
> > baseURI/table/column1#column2#... i.e. baseURI/person/fname#lname
> > 
> > Tuple IRI
> > 
> > baseURI/table/column1:value  i.e baseURI/person/id:12
> > 
> > Multicolumn Tuple IRI
> > 
> > baseURI/table/column1:value1#column2:value2#... i.e
> > baseURI/person/fname:Juan#lname:Sequeda
> > 
> > 
> > This is our proposal. However, we are not aware of the best practices for
> > IRIs. I propose that we open an Issue on "how to generate Table, Tuple and
> > Column IRIs"
> 
> http://www.w3.org/2001/sw/rdb2rdf/directGraph/#rules
> 
> now has text to specify the construction of IRIs in the predicate
> position and the subject/object position. Editorial suggestions
> welcome! (I'm not super-confident that there isn't a clearer way to
> express this.)

I suggest you emphasize the RDB cases the mapping reacts on:
* non-NULL foreign key
* non-NULL column value
* Columns which are also the sole column in a foreign key
* primary key is also a foreign key

I believe that all the part about "concatenating" Strings is useless.
It's hard to read and doesn't help in this informative section. You
already have the same information in §5.

[ snip ]

> > > This is what Eric had originally. So he can comment on his decision. Again,
> > we should create an Issue on this.
> 
> I was following <http://www.w3.org/DesignIssues/RDB-RDF>, but decided to
> shorten "personnel/employees/1234#item" to "personnel/employees/1234#_".
> The choice of hash vs. slash is a real issue. I added an issue 
> [[
> hash-vs-slash: This edition of this document presumes slash
> identifiers. LOD data identifiers tend to use slash, but that slightly
> increases implementation burden and round trips.
> ]]
> http://www.w3.org/2001/sw/rdb2rdf/directGraph/#hash-vs-slash

I suggest we ask Sandro on that point as he has a real expertise.

[ snip ]

> > > I do not find the visual notation for unique keys and foreign keys
> > > particularly clear. How about simply listing them underneath the table?
> > >  Foreign key: addr -> Addresses.ID 
> > >
> > 
> > Could we consider taking away the visual notation for keys, and just have
> > the table with data. We would also put in the SQL DDL and I'm wondering if
> > this would be enough?

I don't like this visual notation either (maybe because it's not
normalized?). I believe that the CREATE statements + the data in the
visual representation (without key information) should be enough.

> 
> I added an bit of a key before the first example. The "empty primary
> key" example was, I believe, the worst of the lot. After struggling a
> bit with a notation, I gave up and copied the XSD from
>   https://dvcs.w3.org/hg/FeDeRate/file/060df0861705/directmapping/src/test/scala/DirectMappingTest.scala#l105
> into
>   http://www.w3.org/2001/sw/rdb2rdf/directGraph/#ref-no-pk
> 
> 
> > >
> > > You write foreign keys as if they reference another *key*. I believe that
> > > doesn't reflect SQL. Foreign keys reference other *columns*. That's the
> > > mental model that a reader is going to have in their head, and that's how it
> > > should be presented in the spec.
> > >
> > 
> > I agree. To make sure, what we currently have for example Address.PK, and we
> > know that the PK of Address is ID, it should then be Address.ID (or
> > something like that). Is that what you mean?
> 
> Actually, I disagree; foreign keys specifically reference candidate
> keys in other tables. If the system does not enforce that, and the
> data in foreign keys matches more than 1 row in the referenced table,
> then we have have a pretty different graph to represent. My temptation
> is to start out conservative, and if we have energy and mandate,
> represent these cases which I believe are non-compliant.

+1

[ snip ]

> > > I object to the representation of simple string literals as
> > > "Cambridge"^^xsd:string. This should simply be "Cambridge". They are
> > > equivalent under datatype semantics, so the simple form should be used.
> > >
> > 
> > We should create an issue on this: "Should a literal include xsd?" Should be
> > discussed in group and come to a consensus.
> 
> http://www.w3.org/2001/sw/rdb2rdf/directGraph/#literaltriples
> now says
> [[
> Per XML Datatypes for SQL Datatypes, string datatypes are expressed as
> an RDF plain literal.
> ]]

Even if ^^xsd:string doesn't have to be part of the serialized RDF (for
convenience reason), it will always be part of the RDF model.

I personally prefer to see it for these reasons:
* the datatype mapping is an important step in the direct mapping so
making it explicit in the serialization is important IMO
* it doesn't change the meaning

> > > Again, please drop the concept of a stem URI and explain that the mapping
> > > uses relative URIs which are resolved against an environment-provided base
> > > URI. Instead of this:
> > >
> > > <http://foo.example/DB/Addresses/ID.18#_> <
> > > http://foo.example/DB/Addresses#ID> 18^^xsd:integer .
> > > <http://foo.example/DB/Addresses/ID.18#_> <
> > > http://foo.example/DB/Addresses#city> "Cambridge"^^xsd:string .
> > > <http://foo.example/DB/Addresses/ID.18#_> <
> > > http://foo.example/DB/Addresses#state> "MA"^^xsd:string .
> > >
> > > I'd like to see this:
> > >
> > > <Addresses/ID=18> <Addresses#ID> 18 .
> > > <Addresses/ID=18> <Addresses#city> "Cambridge" .
> > > <Addresses/ID=18> <Addresses#state> "MA" .

Same comments about datatypes than above.

Alexandre.


> Hmm, it's possible that this might all be do-able with relative URIs.
> But we'd better think about this carefully.
> For now, I've used turtle's @base attribute.
> 
> 
> > Do you mean that we should define a prefix:
> > 
> > @prefix base: <http://foo.example/DB/> .
> > 
> > and then everywhere have
> > 
> > <base:Addresses/ID=18> <base:Addresses#ID> 18 .
> > <base:Addresses/ID=18> <base:Addresses#city> "Cambridge" .
> > <base:Addresses/ID=18> <base:Addresses#state> "MA" .
> > 
> > 
> > 
> > 
> > 
> > > If you do it right, RDF can be simple ;-)
> > >
> > >
> > :)
> > 
> > 
> > >
> > > Again, great work, and I'm very happy to see this spec moving forward and
> > > like the direction it is taking.
> > >
> > 
> > Thanks for you very insightful and direct comments.
> > 
> > Marcelo and I will be working on this in the next couple of days and let
> > everybody know when we have an update. Please keep the comments coming!!!!!
> > 
> > 
> > > Richard
> > >
> > >
> > >
> > >
> > >> All the best,
> > >>
> > >> Marcelo
> > >>
> > >>
> > >
> > >
Received on Thursday, 4 November 2010 13:57:30 UTC