Re: Detailed comments on new default mapping draft from Richard Cyganiak on 2010-11-02 (public-rdb2rdf-wg@w3.org from November 2010)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Tue, 2 Nov 2010 20:35:54 +0000
To: Juan Sequeda <juanfederico@gmail.com>
Cc: Marcelo Arenas <marcelo.arenas1@gmail.com>, "Eric Prud'hommeaux" <eric@w3.org>, RDB2RDF WG <public-rdb2rdf-wg@w3.org>
Message-Id: <88769680-C75B-4DF5-B613-78F4D56A6965@cyganiak.de>
Hi Juan,

Thanks for the reply! Some comments inline.

On 2 Nov 2010, at 19:19, Juan Sequeda wrote:
>> The approach in Section 2 defines URIs for columns and rows, but  
>> not for
>> tables. This means one has to use hacks to do a SPARQL query for  
>> all records
>> in a given table. The approach needs to define URIs for tables as  
>> well, and
>> associate each row with the table it is from.
>
> If we understand correctly, we would need to create IRIs for Tables.  
> Hence,
> there would be now three types of IRIs: Tuple, Columns and Tables.

Yes.

> However,
> if we are to create Table IRIs, then we also need to create a new  
> type of
> triples: Table Triples:
>
> <TupleIRI, rdf:type, Table IRI>
>
> Do we agree?

I discussed this a bit with Eric at some point, and he had some  
reservations about using rdf:type here because it could have  
undesirable implications. I don't really have a strong opinion on the  
choice of property. It could be rdf:type or some other property  
especially defined for this task (xxx:table?). The important thing for  
me: There should be a triple that relates a row to its table, to make  
queries for all rows of a table easier. And all the important  
components of a schema should have URIs, and tables are certainly  
important, so they deserve a URI of their own.

> Column IRI
>
> baseURI/table/column i.e baseURI/person/name

Here, the current approach (baseURI/person#name) sort of makes sense  
to me, because it slightly simplifies HTTP deployment.

> Multicolumn IRI
>
> baseURI/table/column1#column2#... i.e. baseURI/person/fname#lname

Hashes have a very special meaning in URI syntax, so I wouldn't use  
them as generic separators. Having multiple hashes in a URI is almost  
certainly a bad idea.

> Tuple IRI
>
> baseURI/table/column1:value  i.e baseURI/person/id:12

The colon character is also quite special in URI syntax, and is  
generally only used after the protocol part of the URI (http:...).

> Multicolumn Tuple IRI
>
> baseURI/table/column1:value1#column2:value2#... i.e
> baseURI/person/fname:Juan#lname:Sequeda
>
> This is our proposal. However, we are not aware of the best  
> practices for
> IRIs. I propose that we open an Issue on "how to generate Table,  
> Tuple and
> Column IRIs"

+1, we probably need to create an issue and first think about the  
conditions that the solution has to satisfy. We need to take URI  
syntax (RFC 3986), URI design best practices, and the requirements of  
linked data deployment into account.

I haven't thought deeply about this, but spontaneously I would like to  
see “=” for connecting column names to values, and “;” or “,” to  
enumerate multiple items.

>> 2.2 is largely redundant as it only summarizes information that  
>> follows in
>> more detail later. Thus the focus should be on giving a quick intro  
>> to the
>> general idea, using simple language. The example is repeated twice  
>> for no
>> reason.
>>
>
> We think it is important to state the different types of triples in  
> the
> beginning. What it important is that somebody can initially figure  
> out what
> the outcome is before diving into the whole document.

I'm ok with stating the different kinds of triples in 2.2.

>> The verbose textual rendering of the schema is unnecessary and  
>> should be
>> removed. It says nothing that cannot be seen from the visual  
>> representation.
>> Rather use that space for writing the table definition in SQL. Same  
>> for
>> other places in the document where table schemas are spelled out  
>> verbally.
>
> This is fine by me. But Marcelo would like to keep the verbose text.  
> What do
> others think? But we should definitely have the SQL DDL

I would like to hear Marcelo's reasoning. If you have SQL DDL and a  
visual rendering, then what does the text add?

>> I do not find the visual notation for unique keys and foreign keys
>> particularly clear. How about simply listing them underneath the  
>> table?
>> “Foreign key: addr -> Addresses.ID”
>
> Could we consider taking away the visual notation for keys, and just  
> have
> the table with data. We would also put in the SQL DDL and I'm  
> wondering if
> this would be enough?

I think that would work for me, although I'd still have a slight  
preference for *somehow* having the FKs and UKs present in the visual  
rendering. Please keep the special color for the PK column(s), it is  
helpful.

>> You write foreign keys as if they reference another *key*. I  
>> believe that
>> doesn't reflect SQL. Foreign keys reference other *columns*. That's  
>> the
>> mental model that a reader is going to have in their head, and  
>> that's how it
>> should be presented in the spec.
>
> I agree. To make sure, what we currently have for example  
> Address.PK, and we
> know that the PK of Address is ID, it should then be Address.ID (or
> something like that). Is that what you mean?

Exactly!

>> The content of 2.3.1 actually doesn't really match its title. The  
>> title
>> talks about “information in PKs”. What follows is not only about  
>> information
>> in PK columns.
>
> We will change the title. How about "Generating Triples from Primary  
> Keys".
> Consequently, 2.3.2 could be "Generating Triples from Foreign Keys"

Well but 2.3.1 is not just about generating stuff from PKs! It also  
deals with all the columns that are not involved in any key. That's my  
complaint -- from the title you wouldn't be able to guess that this is  
the section that handles the translation of normal columns to literals.

>> 2.3.2: The rules for referencing tables without PKs state that the  
>> object
>> is the target row's Tuple IRI. Earlier you said that such tables  
>> don't have
>> Tuple IRIs but blank nodes.
>
> When we describe a Tuple IRI, we give the case if a table doesn't  
> have a
> primary key, then a blank node should be created. So in a way, it  
> may be
> understood that a blank node is a Tuple IRI, which I know is  
> incorrect. Can
> you suggest how we should go upon this.

In 2.2 you could introduce the concept of a “row RDF node”, which is  
either a “row IRI” (what you now call tuple IRI) or a blank node. Then  
you'd just have to state that the object of a reference triple is the  
“row RDF node” of the target row, and refer to section 2.2 for  
figuring out what the specific node would be.

>> I object to the representation of simple string literals as
>> "Cambridge"^^xsd:string. This should simply be "Cambridge". They are
>> equivalent under datatype semantics, so the simple form should be  
>> used.
>
> We should create an issue on this: "Should a literal include xsd?"  
> Should be
> discussed in group and come to a consensus.

+1

>> 18^^xsd:integer is not valid Turtle. This must either be  
>> "18"^^xsd:integer,
>> or simply 18, which is just Turtle syntactic sugar for the former.  
>> I would
>> highly prefer if the simple form was used throughout.
>>
>
> Yes, our mistake. However using simply 18 instead of having  
> xsd:integer
> should be part of a group discussion. See previous comment about  
> creating
> Issue

It's a different case from the previous one. "Foo" vs.  
"Foo"^^xsd:string is actually a difference on the RDF graph level  
(although in RDF semantics they are equivalent). 18 vs.  
"18"^^xsd:integer are identical on the RDF graph level, it's just  
syntactic sugar in Turtle.

>> I'd like to see this:
>>
>> <Addresses/ID=18> <Addresses#ID> 18 .
>> <Addresses/ID=18> <Addresses#city> "Cambridge" .
>> <Addresses/ID=18> <Addresses#state> "MA" .
>>
>
> Do you mean that we should define a prefix:
>
> @prefix base: <http://foo.example/DB/> .
>
> and then everywhere have
>
> <base:Addresses/ID=18> <base:Addresses#ID> 18 .
> <base:Addresses/ID=18> <base:Addresses#city> "Cambridge" .
> <base:Addresses/ID=18> <base:Addresses#state> "MA" .

No -- I mean just write relative URIs instead of absolute URIs. When  
there is a well-defined base URI, then this makes a lot of sense.

See also:

http://www.w3.org/TeamSubmission/turtle/#uris

So you could write this:

    @base <http://foo.example/DB/> .

    <Addresses/ID=18> <Addresses#ID> 18 .
    <Addresses/ID=18> <Addresses#city> "Cambridge" .
    <Addresses/ID=18> <Addresses#state> "MA" .

That's just a shorter form for using full absolute URIs like <http://foo.example/DB/Addresses/ID=18 
 >. Given that the base URI is just defined once as an input to the  
default mapping, you wouldn't have to repeat it for each example, but  
just explain in the beginning where you currently introduce the “stem  
URI” that throughout the document, the examples will contain relative  
URIs, and these are to be understood as relative to the base URI.


Keep up the good work! Looking forward to an updated version!

Richard




>
>
>
>
>
>> If you do it right, RDF can be simple ;-)
>>
>>
> :)
>
>
>>
>> Again, great work, and I'm very happy to see this spec moving  
>> forward and
>> like the direction it is taking.
>>
>
> Thanks for you very insightful and direct comments.
>
> Marcelo and I will be working on this in the next couple of days and  
> let
> everybody know when we have an update. Please keep the comments  
> coming!!!!!
>
>
>> Richard
>>
>>
>>
>>
>>> All the best,
>>>
>>> Marcelo
>>>
>>>
>>
>>
Received on Tuesday, 2 November 2010 20:36:31 UTC