Re: Detailed comments on new default mapping draft from Juan Sequeda on 2010-11-02 (public-rdb2rdf-wg@w3.org from November 2010)

From: Juan Sequeda <juanfederico@gmail.com>
Date: Tue, 2 Nov 2010 14:19:58 -0500
To: Richard Cyganiak <richard@cyganiak.de>
Cc: Marcelo Arenas <marcelo.arenas1@gmail.com>, "Eric Prud'hommeaux" <eric@w3.org>, RDB2RDF WG <public-rdb2rdf-wg@w3.org>
Message-ID: <AANLkTi=L72CcBZvdZ8aJdYA4CvduAENMP9dS=+yHMqGT@mail.gmail.com>
Richard,

Great comments! Thanks a lot. I'm glad you took the time to read the whole
thing. You even found typos, which means that you really took a magnifying
glass with you.

Looking forward to other comments.

Marcelo and I just had a long meeting and we discussed each of your points.
Comments are inline.


On Tue, Nov 2, 2010 at 9:10 AM, Richard Cyganiak <richard@cyganiak.de>wrote:

> Marcelo, Eric,
>
> I am commenting on Section 2 of
> http://www.w3.org/2001/sw/rdb2rdf/directGraph/alt
> $Id: alt.xml,v 1.2 2010/10/30 03:39:01 marenas Exp $
>
> First of all, great work! Looks like we almost have an FPWD here.
>
> Detailed comments are below. A lot of it is editorial, but there are some
> substantial comments too, as well as some pointers to oversights. I would
> appreciate if each comment could be a) addressed in the text, or b)
> reflected as an @@Issue in the text, or c) replied to in a response to this
> email, or d) turned into an Issue in the W3C tracker.
>

I'm commenting everything inline in this email. Some things would need to be
turned into Issues in the W3C tracker (how to do this?)

>
>
> “Stem URI” — this should be called “base URI”, because that's a commonly
> understood term, and it enables the explanation of URI generation as
> resolution of a relative URI against a base URI.
>

Ok. We will change this.


>
> The document should use SQL terminology throughout. Relation, attribute and
> tuple should be table, column and row, etc.
>

Ok. We will change this.

>
> The approach in Section 2 defines URIs for columns and rows, but not for
> tables. This means one has to use hacks to do a SPARQL query for all records
> in a given table. The approach needs to define URIs for tables as well, and
> associate each row with the table it is from.
>

If we understand correctly, we would need to create IRIs for Tables. Hence,
there would be now three types of IRIs: Tuple, Columns and Tables. However,
if we are to create Table IRIs, then we also need to create a new type of
triples: Table Triples:

<TupleIRI, rdf:type, Table IRI>

Do we agree?



> From the current description, it is impossible to work out how URIs for
> rows with multi-column primary keys would look like. What order? What
> separator characters?
>

The order is the same order of the columns in the table.  Marcelo and I have
the following proposal for creating IRIs:

Table IRI

baseURI/table i.e baseURI/person

Column IRI

baseURI/table/column i.e baseURI/person/name

Multicolumn IRI

baseURI/table/column1#column2#... i.e. baseURI/person/fname#lname

Tuple IRI

baseURI/table/column1:value  i.e baseURI/person/id:12

Multicolumn Tuple IRI

baseURI/table/column1:value1#column2:value2#... i.e
baseURI/person/fname:Juan#lname:Sequeda


This is our proposal. However, we are not aware of the best practices for
IRIs. I propose that we open an Issue on "how to generate Table, Tuple and
Column IRIs"


> I am uncomfortable with the use of the dot character as a separator in
> generated URIs. The character typically used in URIs to indicate a
> hierarchical relationship is "/". The character typically used to indicate
> key-value pairs is "=".
>

See previous comment. We should open Issue on creating IRIs and discuss this
in group.

>
> I am uncomfortable with the use of "#_" at the end of row URIs. I cannot
> see any precedent for that, so I cannot call it good practice. It is also
> unnecessary because the URI identifies a row in a database table and never a
> person/address/organization or whatever other real-world object. Rows in
> database tables are information resources and thus there is no problem at
> all with identifying them using a plain fragment-less URI.
>
> This is what Eric had originally. So he can comment on his decision. Again,
we should create an Issue on this.


> Special characters in table names, column names and PK values need to be
> handled in the URI generation.
>

Again... create an issue on this ( I sound like a broken record)


>
> 2.2 is largely redundant as it only summarizes information that follows in
> more detail later. Thus the focus should be on giving a quick intro to the
> general idea, using simple language. The example is repeated twice for no
> reason.
>
>
We think it is important to state the different types of triples in the
beginning. What it important is that somebody can initially figure out what
the outcome is before diving into the whole document. I would like to hear
what other people have to say about this.



> 2.2 says that the predicate of reference triples are “the Column IRI for
> the columns that constitute the foreign key.” That doesn't work for
> multi-column FKs.
>

Yes. We will separate the Column IRI in two cases: single-column and
multi-column

>
> 2.2 says: “with an XML Schema datatype corresponding to the SQL datatype of
> that column”. That obviously needs to be spelled out. There should be an
> extra section to this, “Mapping SQL Datatypes to RDF Literals” or something
> like that. The section can be a placeholder for FPWD, but should exist.
>

Yes. There needs to be a table with the corresponding mapping

>
> The bullet points in 2.3.1 need refactoring. 90% of each bullet point is
> identical. It took me five minutes of careful parsing to work that out. This
> is lazy writing at the expense of clarity.
>

Yes, we will rewrite this.

>
> In the third bullet point in 2.3.1, “Literal triple” links to the wrong
> place. See, this is why you shouldn't copy-paste the same text three times!
>
>
Ok, we will fix it.


> The example from 2.2 is repeated a third time in 2.3.1 for no reason.
>

We will take away example in 2.3.1 and refer to the one in 2.2


>
> The verbose textual rendering of the schema is unnecessary and should be
> removed. It says nothing that cannot be seen from the visual representation.
> Rather use that space for writing the table definition in SQL. Same for
> other places in the document where table schemas are spelled out verbally.
>

This is fine by me. But Marcelo would like to keep the verbose text. What do
others think? But we should definitely have the SQL DDL


>
> I do not find the visual notation for unique keys and foreign keys
> particularly clear. How about simply listing them underneath the table?
> “Foreign key: addr -> Addresses.ID”
>

Could we consider taking away the visual notation for keys, and just have
the table with data. We would also put in the SQL DDL and I'm wondering if
this would be enough?


>
> You write foreign keys as if they reference another *key*. I believe that
> doesn't reflect SQL. Foreign keys reference other *columns*. That's the
> mental model that a reader is going to have in their head, and that's how it
> should be presented in the spec.
>

I agree. To make sure, what we currently have for example Address.PK, and we
know that the PK of Address is ID, it should then be Address.ID (or
something like that). Is that what you mean?



>
> Oh, 2.3.1 actually has an example that explains how multi-column PKs work.
> This should have been in the place where multi-column PKs were described.
>
>
Yes we will organize this better. Also see our previous comments on this.


> The use of "_" as a separator between the column/value pairs in
> multi-column PK row URIs is a bad idea, because the underscore character is
> ubiquitous in table and column names. An obvious replacement would be ";".
>


Again.. we need to create an Issue about generating IRIs :P


>
> I found the last example in 2.3.1 confusing because it didn't generate a
> triple from the FK. The text before the example made it sound as if the
> following was a complete translation of the table. The text could be clearer
> about the fact that only the non-FK columns are translated.
>

Yes, we will be more explicit about what each rule is going to generate.

>
> The content of 2.3.1 actually doesn't really match its title. The title
> talks about “information in PKs”. What follows is not only about information
> in PK columns.
>

We will change the title. How about "Generating Triples from Primary Keys".
Consequently, 2.3.2 could be "Generating Triples from Foreign Keys"


>
> 2.3.2: The rules for referencing tables without PKs state that the object
> is the target row's Tuple IRI. Earlier you said that such tables don't have
> Tuple IRIs but blank nodes.
>

When we describe a Tuple IRI, we give the case if a table doesn't have a
primary key, then a blank node should be created. So in a way, it may be
understood that a blank node is a Tuple IRI, which I know is incorrect. Can
you suggest how we should go upon this.


>
> The bullet points in 2.3.2 need refactoring to separate the common stuff
> from the stuff that's different between them.
>

Ok. We will organize this.

>
> 2.3.2 explain the subjects and objects, but not the predicates of generated
> triples.
>

Ok. We need to add this


>
> http://foo.example/DB/Department#Manager -- why is Manager uppercase?
>

typo


>
> I object to the representation of simple string literals as
> "Cambridge"^^xsd:string. This should simply be "Cambridge". They are
> equivalent under datatype semantics, so the simple form should be used.
>

We should create an issue on this: "Should a literal include xsd?" Should be
discussed in group and come to a consensus.

>
> 18^^xsd:integer is not valid Turtle. This must either be "18"^^xsd:integer,
> or simply 18, which is just Turtle syntactic sugar for the former. I would
> highly prefer if the simple form was used throughout.
>
>
Yes, our mistake. However using simply 18 instead of having xsd:integer
should be part of a group discussion. See previous comment about creating
Issue


> Again, please drop the concept of a stem URI and explain that the mapping
> uses relative URIs which are resolved against an environment-provided base
> URI. Instead of this:
>
> <http://foo.example/DB/Addresses/ID.18#_> <
> http://foo.example/DB/Addresses#ID> 18^^xsd:integer .
> <http://foo.example/DB/Addresses/ID.18#_> <
> http://foo.example/DB/Addresses#city> "Cambridge"^^xsd:string .
> <http://foo.example/DB/Addresses/ID.18#_> <
> http://foo.example/DB/Addresses#state> "MA"^^xsd:string .
>
> I'd like to see this:
>
> <Addresses/ID=18> <Addresses#ID> 18 .
> <Addresses/ID=18> <Addresses#city> "Cambridge" .
> <Addresses/ID=18> <Addresses#state> "MA" .
>
>
Do you mean that we should define a prefix:

@prefix base: <http://foo.example/DB/> .

and then everywhere have

<base:Addresses/ID=18> <base:Addresses#ID> 18 .
<base:Addresses/ID=18> <base:Addresses#city> "Cambridge" .
<base:Addresses/ID=18> <base:Addresses#state> "MA" .





> If you do it right, RDF can be simple ;-)
>
>
:)


>
> Again, great work, and I'm very happy to see this spec moving forward and
> like the direction it is taking.
>

Thanks for you very insightful and direct comments.

Marcelo and I will be working on this in the next couple of days and let
everybody know when we have an update. Please keep the comments coming!!!!!


> Richard
>
>
>
>
>> All the best,
>>
>> Marcelo
>>
>>
>
>
Received on Tuesday, 2 November 2010 19:20:53 UTC