Re: Detailed comments on new default mapping draft from Marcelo Arenas on 2010-11-03 (public-rdb2rdf-wg@w3.org from November 2010)

From: Marcelo Arenas <marcelo.arenas1@gmail.com>
Date: Wed, 3 Nov 2010 09:53:01 -0300
To: Richard Cyganiak <richard@cyganiak.de>
Cc: Juan Sequeda <juanfederico@gmail.com>, "Eric Prud'hommeaux" <eric@w3.org>, RDB2RDF WG <public-rdb2rdf-wg@w3.org>
Message-ID: <AANLkTi=6XbyN-xcPiubvE9fc5KDbLq0c9v5MqktdsOhh@mail.gmail.com>
Hi Richard,

Thanks for the comments! Some more comments inline.

On Tue, Nov 2, 2010 at 5:35 PM, Richard Cyganiak <richard@cyganiak.de> wrote:
> Hi Juan,
>
> Thanks for the reply! Some comments inline.
>
> On 2 Nov 2010, at 19:19, Juan Sequeda wrote:
>>>
>>> The approach in Section 2 defines URIs for columns and rows, but not for
>>> tables. This means one has to use hacks to do a SPARQL query for all
>>> records
>>> in a given table. The approach needs to define URIs for tables as well,
>>> and
>>> associate each row with the table it is from.
>>
>> If we understand correctly, we would need to create IRIs for Tables.
>> Hence,
>> there would be now three types of IRIs: Tuple, Columns and Tables.
>
> Yes.
>
>> However,
>> if we are to create Table IRIs, then we also need to create a new type of
>> triples: Table Triples:
>>
>> <TupleIRI, rdf:type, Table IRI>
>>
>> Do we agree?
>
> I discussed this a bit with Eric at some point, and he had some reservations
> about using rdf:type here because it could have undesirable implications. I
> don't really have a strong opinion on the choice of property. It could be
> rdf:type or some other property especially defined for this task
> (xxx:table?). The important thing for me: There should be a triple that
> relates a row to its table, to make queries for all rows of a table easier.
> And all the important components of a schema should have URIs, and tables
> are certainly important, so they deserve a URI of their own.

I agree with this.


>> Column IRI
>>
>> baseURI/table/column i.e baseURI/person/name
>
> Here, the current approach (baseURI/person#name) sort of makes sense to me,
> because it slightly simplifies HTTP deployment.
>
>> Multicolumn IRI
>>
>> baseURI/table/column1#column2#... i.e. baseURI/person/fname#lname
>
> Hashes have a very special meaning in URI syntax, so I wouldn't use them as
> generic separators. Having multiple hashes in a URI is almost certainly a
> bad idea.
>
>> Tuple IRI
>>
>> baseURI/table/column1:value  i.e baseURI/person/id:12
>
> The colon character is also quite special in URI syntax, and is generally
> only used after the protocol part of the URI (http:...).
>
>> Multicolumn Tuple IRI
>>
>> baseURI/table/column1:value1#column2:value2#... i.e
>> baseURI/person/fname:Juan#lname:Sequeda
>>
>> This is our proposal. However, we are not aware of the best practices for
>> IRIs. I propose that we open an Issue on "how to generate Table, Tuple and
>> Column IRIs"
>
> +1, we probably need to create an issue and first think about the conditions
> that the solution has to satisfy. We need to take URI syntax (RFC 3986), URI
> design best practices, and the requirements of linked data deployment into
> account.
>
> I haven't thought deeply about this, but spontaneously I would like to see
> “=” for connecting column names to values, and “;” or “,” to enumerate
> multiple items.

For the time being,  we can use "=" and "," in the document. Then we
will have to decide what a good notation is.


>>> 2.2 is largely redundant as it only summarizes information that follows
>>> in
>>> more detail later. Thus the focus should be on giving a quick intro to
>>> the
>>> general idea, using simple language. The example is repeated twice for no
>>> reason.
>>>
>>
>> We think it is important to state the different types of triples in the
>> beginning. What it important is that somebody can initially figure out
>> what
>> the outcome is before diving into the whole document.
>
> I'm ok with stating the different kinds of triples in 2.2.
>
>>> The verbose textual rendering of the schema is unnecessary and should be
>>> removed. It says nothing that cannot be seen from the visual
>>> representation.
>>> Rather use that space for writing the table definition in SQL. Same for
>>> other places in the document where table schemas are spelled out
>>> verbally.
>>
>> This is fine by me. But Marcelo would like to keep the verbose text. What
>> do
>> others think? But we should definitely have the SQL DDL
>
> I would like to hear Marcelo's reasoning. If you have SQL DDL and a visual
> rendering, then what does the text add?

This is a matter of taste. I personally dislike examples without a
text explanation (I tend to think that they were not carefully
written). But I have to recognize that the example in the document is
quite simple, so we could have a shorter text about the example (not
mentioning, for example, the columns of the tables). Would that be OK
with you?


>>> I do not find the visual notation for unique keys and foreign keys
>>> particularly clear. How about simply listing them underneath the table?
>>> “Foreign key: addr -> Addresses.ID”
>>
>> Could we consider taking away the visual notation for keys, and just have
>> the table with data. We would also put in the SQL DDL and I'm wondering if
>> this would be enough?
>
> I think that would work for me, although I'd still have a slight preference
> for *somehow* having the FKs and UKs present in the visual rendering. Please
> keep the special color for the PK column(s), it is helpful.
>
>>> You write foreign keys as if they reference another *key*. I believe that
>>> doesn't reflect SQL. Foreign keys reference other *columns*. That's the
>>> mental model that a reader is going to have in their head, and that's how
>>> it
>>> should be presented in the spec.
>>
>> I agree. To make sure, what we currently have for example Address.PK, and
>> we
>> know that the PK of Address is ID, it should then be Address.ID (or
>> something like that). Is that what you mean?
>
> Exactly!
>
>>> The content of 2.3.1 actually doesn't really match its title. The title
>>> talks about “information in PKs”. What follows is not only about
>>> information
>>> in PK columns.
>>
>> We will change the title. How about "Generating Triples from Primary
>> Keys".
>> Consequently, 2.3.2 could be "Generating Triples from Foreign Keys"
>
> Well but 2.3.1 is not just about generating stuff from PKs! It also deals
> with all the columns that are not involved in any key. That's my complaint
> -- from the title you wouldn't be able to guess that this is the section
> that handles the translation of normal columns to literals.

Now I understand your point. What about the title "The first step of
the translation process: Generating literal triples"?

All the best,

Marcelo
Received on Wednesday, 3 November 2010 12:53:35 UTC